Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 7 of 7
  1. #1
    Regular Coder
    Join Date
    Dec 2006
    Posts
    417
    Thanks
    168
    Thanked 1 Time in 1 Post

    preg_matching images on a page

    if I have a url (var $url) how do I preg_match the images in that url?

    I want to display all of the images within the $url document


    -----
    to make things a little more complex (I dont know if this can be done) I want content related images -- not header graphics, not "contact me" graphics or any icons... just the images that are within the body of the text -- - I really don't think this can be done because the $url can be any website...

  • #2
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    Quote Originally Posted by Bobafart View Post
    if I have a url (var $url) how do I preg_match the images in that url?

    I want to display all of the images within the $url document


    -----
    to make things a little more complex (I dont know if this can be done) I want content related images -- not header graphics, not "contact me" graphics or any icons... just the images that are within the body of the text -- - I really don't think this can be done because the $url can be any website...
    It's not clear for me. Do you want to take a url, let's say http://www.google.com/ and extract from that page all attributes src of html tags img?
    this can be done, about filtering, without icons, contact, I don't thing there is a programmatic solution, maybe manual. A page, with all new pictures retrived from last process and manual checking.

    best regards
    Last edited by oesxyl; 02-18-2008 at 02:55 AM.

  • Users who have thanked oesxyl for this post:

    Bobafart (02-18-2008)

  • #3
    Regular Coder
    Join Date
    Dec 2006
    Posts
    417
    Thanks
    168
    Thanked 1 Time in 1 Post
    yes sir, that is exactly what I am trying to do

  • #4
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    Quote Originally Posted by Bobafart View Post
    yes sir, that is exactly what I am trying to do
    file_get_contents get the file from the net:

    http://www.php.net/manual/en/functio...t-contents.php

    and return a string, so you can extract img tags using a regex:

    Code:
    /<img[^s]+src=\"([^\">]+)\"/
    the result could be relative path to the images or absolute, you must somehow deal with that, but that I presume is simple,

    best regards

  • Users who have thanked oesxyl for this post:

    Bobafart (02-18-2008)

  • #5
    Regular Coder
    Join Date
    Dec 2006
    Posts
    417
    Thanks
    168
    Thanked 1 Time in 1 Post
    I am doing the following:

    Code:
    $source = file_get_contents( $url );
    preg_match( '/<img[^s]+src=\"([^\">]+)\"/', $source, $m3 );
    $getImages = isset( $m3[1] ) ? $m3[1] : '';
    problem is that it only gets the first image for every page -- sometimes it doesn't show any images.... how do I get it to get all of the images on a page?

  • #6
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    Quote Originally Posted by Bobafart View Post
    I am doing the following:

    Code:
    $source = file_get_contents( $url );
    preg_match( '/<img[^s]+src=\"([^\">]+)\"/', $source, $m3 );
    $getImages = isset( $m3[1] ) ? $m3[1] : '';
    problem is that it only gets the first image for every page -- sometimes it doesn't show any images.... how do I get it to get all of the images on a page?
    - try with preg_match_all
    - check the results, if the path is relative for example img/pic.jpg you must add the url to transform in http://www.google.com/img/pic.jpg
    - it could be something like /img/pic.jpg, you must remove the / to avoid duplicate as //
    - if is absolute is allready ok
    - all the path must be absolute in the end

    best regards

  • #7
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    I don't know if you solve the problem or not, here is an example, I tested and it work.

    PHP Code:
    <?php

    $url 
    "http://www.e-imobiliare.ro/index.html";
    $baseurl preg_replace("/[^\/]+$/","",$url);
    $page file_get_contents($url);
    $parts explode("<",$page);
    $images = array();
    foreach(
    $parts as $part){
      if(
    preg_match("/img/",$part)){
        
    $part preg_replace("/^img.+src=\"([^\"]+)\".+$/m","$1",$part);
        if(!
    preg_match("/http:/",$part)){
          
    $part preg_replace("/^\//","",$part);
          
    $part $baseurl $part;
        }
        
    $images[] = $part;
      }
    }

    foreach(
    $images as $img){
      print 
    '<img src="'.$img.'">';
    }

    ?>
    I abuse a litle of regex, is far to be best solution, the idea was to fit as much possible situatiion I can imagine. It can extract if the site hide image using javascript.
    I don't test it with url with ? inside, and you must keep in mind to use urlencode in some situation.
    you can easy use strxxx instead of regex in few lines,

    best regards
    Last edited by oesxyl; 02-18-2008 at 06:20 AM.

  • Users who have thanked oesxyl for this post:

    Bobafart (02-18-2008)


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •