...

View Full Version : Take data from website using PHP



swethak
07-27-2008, 10:06 AM
hi,

i write the below code to capture the images from the website when i submit the url.In the same way i want to capture the Text information from the website.plz tell that whats the code for that.plz help me.



<?php

$content= file_get_contents($url);
preg_match_all( "/<img(.*)src=(\"|')(.*)(\"|\')(.*)[\/]?>/siU", $content, $match, PREG_PATTERN_ORDER);

echo "<b>Capture Images :</b><br>";
echo "<br>";
print_r($match[0]);
echo "<br>";
echo "<br>";
echo "<b>Capture Images URLS :</b><br><br>";
preg_match_all( "/<img(.*)src=(\"|')(.*)(\"|\')(.*)[\/]?>/siU", $content, $match, PREG_PATTERN_ORDER);
print_r($match[3]);
?>

Fou-Lu
07-27-2008, 08:49 PM
Capturing text is far more difficult than capturing an image. The problem is consistency, we cannot be certain if text has been stored in p, span, and li tags. Unless they have a declared doctype, there is no guarantee that text will be placed in the proper locations.

BWiz
07-27-2008, 09:02 PM
You could try searching for specific tags (such as p,h1,span) and calculating the length of characters between the opening tag and the closing tag (strlen). With this number you could extract the characters inbetween the tags (strstr, I think). Loop through the entire document looking for these tags and extracting them.

mlseim
07-28-2008, 01:32 PM
Just out of curiosity, what is the website you want to parse?

FJbrian
07-29-2008, 12:19 PM
google "spiderring web pages" or "data mining"
there's a number of scripts out there you could use



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum