Go Back   CodingForums.com > :: Server side development > PHP

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 02-17-2008, 07:08 PM   PM User | #1
Bobafart
Regular Coder

 
Join Date: Dec 2006
Posts: 416
Thanks: 168
Thanked 1 Time in 1 Post
Bobafart is on a distinguished road
preg_matching images on a page

if I have a url (var $url) how do I preg_match the images in that url?

I want to display all of the images within the $url document


-----
to make things a little more complex (I dont know if this can be done) I want content related images -- not header graphics, not "contact me" graphics or any icons... just the images that are within the body of the text -- - I really don't think this can be done because the $url can be any website...
Bobafart is offline   Reply With Quote
Old 02-18-2008, 02:51 AM   PM User | #2
oesxyl
Master Coder


 
Join Date: Dec 2007
Posts: 6,682
Thanks: 436
Thanked 890 Times in 879 Posts
oesxyl is a jewel in the roughoesxyl is a jewel in the roughoesxyl is a jewel in the rough
Quote:
Originally Posted by Bobafart View Post
if I have a url (var $url) how do I preg_match the images in that url?

I want to display all of the images within the $url document


-----
to make things a little more complex (I dont know if this can be done) I want content related images -- not header graphics, not "contact me" graphics or any icons... just the images that are within the body of the text -- - I really don't think this can be done because the $url can be any website...
It's not clear for me. Do you want to take a url, let's say http://www.google.com/ and extract from that page all attributes src of html tags img?
this can be done, about filtering, without icons, contact, I don't thing there is a programmatic solution, maybe manual. A page, with all new pictures retrived from last process and manual checking.

best regards

Last edited by oesxyl; 02-18-2008 at 02:55 AM..
oesxyl is offline   Reply With Quote
Users who have thanked oesxyl for this post:
Bobafart (02-18-2008)
Old 02-18-2008, 02:54 AM   PM User | #3
Bobafart
Regular Coder

 
Join Date: Dec 2006
Posts: 416
Thanks: 168
Thanked 1 Time in 1 Post
Bobafart is on a distinguished road
yes sir, that is exactly what I am trying to do
Bobafart is offline   Reply With Quote
Old 02-18-2008, 03:06 AM   PM User | #4
oesxyl
Master Coder


 
Join Date: Dec 2007
Posts: 6,682
Thanks: 436
Thanked 890 Times in 879 Posts
oesxyl is a jewel in the roughoesxyl is a jewel in the roughoesxyl is a jewel in the rough
Quote:
Originally Posted by Bobafart View Post
yes sir, that is exactly what I am trying to do
file_get_contents get the file from the net:

http://www.php.net/manual/en/functio...t-contents.php

and return a string, so you can extract img tags using a regex:

Code:
/<img[^s]+src=\"([^\">]+)\"/
the result could be relative path to the images or absolute, you must somehow deal with that, but that I presume is simple,

best regards
oesxyl is offline   Reply With Quote
Users who have thanked oesxyl for this post:
Bobafart (02-18-2008)
Old 02-18-2008, 04:08 AM   PM User | #5
Bobafart
Regular Coder

 
Join Date: Dec 2006
Posts: 416
Thanks: 168
Thanked 1 Time in 1 Post
Bobafart is on a distinguished road
I am doing the following:

Code:
$source = file_get_contents( $url );
preg_match( '/<img[^s]+src=\"([^\">]+)\"/', $source, $m3 );
$getImages = isset( $m3[1] ) ? $m3[1] : '';
problem is that it only gets the first image for every page -- sometimes it doesn't show any images.... how do I get it to get all of the images on a page?
Bobafart is offline   Reply With Quote
Old 02-18-2008, 04:21 AM   PM User | #6
oesxyl
Master Coder


 
Join Date: Dec 2007
Posts: 6,682
Thanks: 436
Thanked 890 Times in 879 Posts
oesxyl is a jewel in the roughoesxyl is a jewel in the roughoesxyl is a jewel in the rough
Quote:
Originally Posted by Bobafart View Post
I am doing the following:

Code:
$source = file_get_contents( $url );
preg_match( '/<img[^s]+src=\"([^\">]+)\"/', $source, $m3 );
$getImages = isset( $m3[1] ) ? $m3[1] : '';
problem is that it only gets the first image for every page -- sometimes it doesn't show any images.... how do I get it to get all of the images on a page?
- try with preg_match_all
- check the results, if the path is relative for example img/pic.jpg you must add the url to transform in http://www.google.com/img/pic.jpg
- it could be something like /img/pic.jpg, you must remove the / to avoid duplicate as //
- if is absolute is allready ok
- all the path must be absolute in the end

best regards
oesxyl is offline   Reply With Quote
Old 02-18-2008, 06:17 AM   PM User | #7
oesxyl
Master Coder


 
Join Date: Dec 2007
Posts: 6,682
Thanks: 436
Thanked 890 Times in 879 Posts
oesxyl is a jewel in the roughoesxyl is a jewel in the roughoesxyl is a jewel in the rough
I don't know if you solve the problem or not, here is an example, I tested and it work.

PHP Code:
<?php

$url 
"http://www.e-imobiliare.ro/index.html";
$baseurl preg_replace("/[^\/]+$/","",$url);
$page file_get_contents($url);
$parts explode("<",$page);
$images = array();
foreach(
$parts as $part){
  if(
preg_match("/img/",$part)){
    
$part preg_replace("/^img.+src=\"([^\"]+)\".+$/m","$1",$part);
    if(!
preg_match("/http:/",$part)){
      
$part preg_replace("/^\//","",$part);
      
$part $baseurl $part;
    }
    
$images[] = $part;
  }
}

foreach(
$images as $img){
  print 
'<img src="'.$img.'">';
}

?>
I abuse a litle of regex, is far to be best solution, the idea was to fit as much possible situatiion I can imagine. It can extract if the site hide image using javascript.
I don't test it with url with ? inside, and you must keep in mind to use urlencode in some situation.
you can easy use strxxx instead of regex in few lines,

best regards

Last edited by oesxyl; 02-18-2008 at 06:20 AM..
oesxyl is offline   Reply With Quote
Users who have thanked oesxyl for this post:
Bobafart (02-18-2008)
Reply

Bookmarks

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 08:28 AM.


Advertisement
Log in to turn off these ads.