...

View Full Version : Fetching urls from a string (hlp pls)



Coastal Web
08-12-2007, 02:55 AM
Greetings everyone,
I'm trying to write up a script but l've run into a road block here. After searching google for a bit l wasn't able to turn up anything that really helped me (l also searched the forum here to see if this has been asked before...)

What l'm trying to do is create a function that will go through a string, extract all the URLS form the string, and return them as an array.

For instance....

<?php

$str = <<<end
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. http://www.test.com/somefile.php Ut wisi enim ad minim veniam, quis nostrud exercitation ulliam corper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem veleum iriure dolor http://www.domain.com/files/deep/link.php?id=123&user=123 in hendrerit in vulputate velit esse molestie consequat, vel willum lunombro dolore eu feugiat nulla facilisis at vero http://www.google.com/ eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.
end;

//how would l create a function that would go through a string passed to it (similar to the string above), and fetch out all of the URLS within that string (if any) and return those URLS in an array; in this case there would be three urls return within the array...
// http://www.test.com/somefile.php
// http://www.domain.com/files/deep/link.php?id=123&user=123
// http://www.google.com/


?>

If anyone would be willing to help me out with this is would be greatly appreciated.

Thanks so much,

StupidRalph
08-12-2007, 03:36 AM
Hmmm I'm guessing it'll be easiest if you were to use some type of Reg. Expression it'll work. I've never wrote one on my own only customized some. But I'm guessing if you were to look for the "http://" you will know that it starts the url. (This is assuming that all URLs are full like in your example and not www.example.com) Then from there, we know that a URL is not going to have a space in it so we can look for the first space and know that will be the end of the URL.

As I've stated tho, this is only my hypothesis, but it seems logical. To me anyways. There, may be a better way.

Inigoesdr
08-12-2007, 03:40 AM
<?php
$str = <<<end
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. http://www.test.com/somefile.php Ut wisi enim ad minim veniam, quis nostrud exercitation ulliam corper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem veleum iriure dolor http://www.domain.com/files/deep/link.php?id=123&user=123 in hendrerit in vulputate velit esse molestie consequat, vel willum lunombro dolore eu feugiat nulla facilisis at vero http://www.google.com/ eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.
end;

$matches = array();
preg_match_all('/((www|http)(\W+\S+[^).,:;?\]\} \r\n$]+))/i', $str, $matches);
die('<pre>' . print_r($matches[0],1));
?>
Returns:


Array
(
[0] => http://www.test.com/somefile.php
[1] => http://www.domain.com/files/deep/link.php?id=123&user=123
[2] => http://www.google.com/
)

http://www.php.net/manual/en/function.preg-match-all.php
http://regexlib.com/Search.aspx?k=url

StupidRalph
08-12-2007, 03:56 AM
Great job Inigoesdr. I was just trying this out. I knew was thinking it would be preg_match_all(). I was just unsure of the pattern. I see that yours will even get the ones that start with www. But thats as far as my understanding goes. I know that the $ is the end of the expression :D. I'm going to see if I can decipher what this means.:thumbsup:

And, actually I haven't seen you around in a second, so if you're just returning....Welcome Back.

Coastal Web
08-12-2007, 04:03 AM
Yes, thank you very much Inigoesdr; that dose the trick perfectly.

Now l have a smaller side question that kind've works in with the script l'm working with here...

lets say once l have these URLS; and l wanted my server to "hit" or "visit" each of the URLS that have been snagged form the string as though it were a browser (via HTTP).... what is the fastest // least server intense way to do this?

Would it be with a quick curl connection, using file_get_contents(), maybe fopen(), or perhaps using the snoopy class (http://sourceforge.net/projects/snoopy/)?

What would be the fastest, least resource using method?

Thoughts and suggestions on this appriciated...

Inigoesdr
08-12-2007, 04:12 AM
I would say fopen/file_get_contents.

Coastal Web
08-12-2007, 04:15 AM
tanks again... l'll do some speed testing with the different options; and post the results (if anyone really gives a hoot...)

Warm regards, and thank you again Inigoesdr; you rock!
//too bad codingforms.com doesn't have karma points :S

Inigoesdr
08-12-2007, 04:33 AM
You're welcome. And I think the scale under the avatar is for rep.

And, actually I haven't seen you around in a second, so if you're just returning....Welcome Back.
Thanks. ;)



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum