...

View Full Version : Extracting URL Data From TXT/XML File



Sussex_Chris
08-28-2009, 04:30 PM
I am looking to extract all data from an XML / TXT file that is a URL and just grab the parent URL.

I tried this but it was not successful:

(I have been posting the data from a textarea form to this code, this part is working fine):


// If form has been posted then start processing the data
if (isset($_REQUEST['start'])) {

// Post the data
$data = $_POST['data'];
function get_tags($html) {
$regexp = '/(http:\/\/)(.*?)(\/)/';
preg_match_all($regexp,$html,$matches,PREG_SET_ORDER);
foreach ($matches as $tag) {
$tags[] = "Http://".$tag[2]."/";
}
return $tags;
}

if(is_array($list = get_tags($data))) {
foreach ($list as $tag) {
echo $tag."\r\n";
}
}
exit;
}

(The URL's in the txt file start as Http:// not http://).

Any idea how to get this working?

kbluhm
08-28-2009, 05:11 PM
Here's a quick and dirty way of extracting URLs:


preg_match_all( '/\bhttps?\:\/\/\S+/i', $input, $urls );

$urls will be an array of matches of any bit of text that begins with http:// or https:// and continues till it reaches any whitespace or end of line.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum