PDA

View Full Version : preg_matching images on a page



Bobafart
02-17-2008, 08:08 PM
if I have a url (var $url) how do I preg_match the images in that url?

I want to display all of the images within the $url document


-----
to make things a little more complex (I dont know if this can be done) I want content related images -- not header graphics, not "contact me" graphics or any icons... just the images that are within the body of the text -- - I really don't think this can be done because the $url can be any website...

oesxyl
02-18-2008, 03:51 AM
if I have a url (var $url) how do I preg_match the images in that url?

I want to display all of the images within the $url document


-----
to make things a little more complex (I dont know if this can be done) I want content related images -- not header graphics, not "contact me" graphics or any icons... just the images that are within the body of the text -- - I really don't think this can be done because the $url can be any website...
It's not clear for me. Do you want to take a url, let's say http://www.google.com/ and extract from that page all attributes src of html tags img?
this can be done, about filtering, without icons, contact, I don't thing there is a programmatic solution, maybe manual. A page, with all new pictures retrived from last process and manual checking.

best regards

Bobafart
02-18-2008, 03:54 AM
yes sir, that is exactly what I am trying to do

oesxyl
02-18-2008, 04:06 AM
yes sir, that is exactly what I am trying to do

file_get_contents get the file from the net:

http://www.php.net/manual/en/function.file-get-contents.php

and return a string, so you can extract img tags using a regex:



/<img[^s]+src=\"([^\">]+)\"/


the result could be relative path to the images or absolute, you must somehow deal with that, but that I presume is simple, :)

best regards

Bobafart
02-18-2008, 05:08 AM
I am doing the following:



$source = file_get_contents( $url );
preg_match( '/<img[^s]+src=\"([^\">]+)\"/', $source, $m3 );
$getImages = isset( $m3[1] ) ? $m3[1] : '';


problem is that it only gets the first image for every page -- sometimes it doesn't show any images.... how do I get it to get all of the images on a page?

oesxyl
02-18-2008, 05:21 AM
I am doing the following:



$source = file_get_contents( $url );
preg_match( '/<img[^s]+src=\"([^\">]+)\"/', $source, $m3 );
$getImages = isset( $m3[1] ) ? $m3[1] : '';


problem is that it only gets the first image for every page -- sometimes it doesn't show any images.... how do I get it to get all of the images on a page?

- try with preg_match_all
- check the results, if the path is relative for example img/pic.jpg you must add the url to transform in http://www.google.com/img/pic.jpg
- it could be something like /img/pic.jpg, you must remove the / to avoid duplicate as //
- if is absolute is allready ok
- all the path must be absolute in the end

best regards

oesxyl
02-18-2008, 07:17 AM
I don't know if you solve the problem or not, here is an example, I tested and it work.



<?php

$url = "http://www.e-imobiliare.ro/index.html";
$baseurl = preg_replace("/[^\/]+$/","",$url);
$page = file_get_contents($url);
$parts = explode("<",$page);
$images = array();
foreach($parts as $part){
if(preg_match("/img/",$part)){
$part = preg_replace("/^img.+src=\"([^\"]+)\".+$/m","$1",$part);
if(!preg_match("/http:/",$part)){
$part = preg_replace("/^\//","",$part);
$part = $baseurl . $part;
}
$images[] = $part;
}
}

foreach($images as $img){
print '<img src="'.$img.'">';
}

?>


I abuse a litle of regex, is far to be best solution, the idea was to fit as much possible situatiion I can imagine. It can extract if the site hide image using javascript.
I don't test it with url with ? inside, and you must keep in mind to use urlencode in some situation.
you can easy use strxxx instead of regex in few lines,

best regards