...

View Full Version : preg_matching images on a page



Bobafart
02-17-2008, 08:08 PM
if I have a url (var $url) how do I preg_match the images in that url?

I want to display all of the images within the $url document


-----
to make things a little more complex (I dont know if this can be done) I want content related images -- not header graphics, not "contact me" graphics or any icons... just the images that are within the body of the text -- - I really don't think this can be done because the $url can be any website...

oesxyl
02-18-2008, 03:51 AM
if I have a url (var $url) how do I preg_match the images in that url?

I want to display all of the images within the $url document


-----
to make things a little more complex (I dont know if this can be done) I want content related images -- not header graphics, not "contact me" graphics or any icons... just the images that are within the body of the text -- - I really don't think this can be done because the $url can be any website...
It's not clear for me. Do you want to take a url, let's say http://www.google.com/ and extract from that page all attributes src of html tags img?
this can be done, about filtering, without icons, contact, I don't thing there is a programmatic solution, maybe manual. A page, with all new pictures retrived from last process and manual checking.

best regards

Bobafart
02-18-2008, 03:54 AM
yes sir, that is exactly what I am trying to do

oesxyl
02-18-2008, 04:06 AM
yes sir, that is exactly what I am trying to do

file_get_contents get the file from the net:

http://www.php.net/manual/en/function.file-get-contents.php

and return a string, so you can extract img tags using a regex:



/<img[^s]+src=\"([^\">]+)\"/


the result could be relative path to the images or absolute, you must somehow deal with that, but that I presume is simple, :)

best regards

Bobafart
02-18-2008, 05:08 AM
I am doing the following:



$source = file_get_contents( $url );
preg_match( '/<img[^s]+src=\"([^\">]+)\"/', $source, $m3 );
$getImages = isset( $m3[1] ) ? $m3[1] : '';


problem is that it only gets the first image for every page -- sometimes it doesn't show any images.... how do I get it to get all of the images on a page?

oesxyl
02-18-2008, 05:21 AM
I am doing the following:



$source = file_get_contents( $url );
preg_match( '/<img[^s]+src=\"([^\">]+)\"/', $source, $m3 );
$getImages = isset( $m3[1] ) ? $m3[1] : '';


problem is that it only gets the first image for every page -- sometimes it doesn't show any images.... how do I get it to get all of the images on a page?

- try with preg_match_all
- check the results, if the path is relative for example img/pic.jpg you must add the url to transform in http://www.google.com/img/pic.jpg
- it could be something like /img/pic.jpg, you must remove the / to avoid duplicate as //
- if is absolute is allready ok
- all the path must be absolute in the end

best regards

oesxyl
02-18-2008, 07:17 AM
I don't know if you solve the problem or not, here is an example, I tested and it work.



<?php

$url = "http://www.e-imobiliare.ro/index.html";
$baseurl = preg_replace("/[^\/]+$/","",$url);
$page = file_get_contents($url);
$parts = explode("<",$page);
$images = array();
foreach($parts as $part){
if(preg_match("/img/",$part)){
$part = preg_replace("/^img.+src=\"([^\"]+)\".+$/m","$1",$part);
if(!preg_match("/http:/",$part)){
$part = preg_replace("/^\//","",$part);
$part = $baseurl . $part;
}
$images[] = $part;
}
}

foreach($images as $img){
print '<img src="'.$img.'">';
}

?>


I abuse a litle of regex, is far to be best solution, the idea was to fit as much possible situatiion I can imagine. It can extract if the site hide image using javascript.
I don't test it with url with ? inside, and you must keep in mind to use urlencode in some situation.
you can easy use strxxx instead of regex in few lines,

best regards



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum