08-28-2009, 05:30 PM
I am looking to extract all data from an XML / TXT file that is a URL and just grab the parent URL.

I tried this but it was not successful:

(I have been posting the data from a textarea form to this code, this part is working fine):

// If form has been posted then start processing the data
if (isset($_REQUEST['start'])) {

// Post the data
$data = $_POST['data'];
function get_tags($html) {
$regexp = '/(http:\/\/)(.*?)(\/)/';
foreach ($matches as $tag) {
$tags[] = "Http://".$tag[2]."/";
return $tags;

if(is_array($list = get_tags($data))) {
foreach ($list as $tag) {
echo $tag."\r\n";

(The URL's in the txt file start as Http:// not http://).

Any idea how to get this working?

08-28-2009, 06:11 PM
Here's a quick and dirty way of extracting URLs:

preg_match_all( '/\bhttps?\:\/\/\S+/i', $input, $urls );

$urls will be an array of matches of any bit of text that begins with http:// or https:// and continues till it reaches any whitespace or end of line.