Go Back   CodingForums.com > :: Server side development > PHP

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 08-28-2006, 01:50 PM   PM User | #1
dudey
New to the CF scene

 
Join Date: Aug 2006
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
dudey is an unknown quantity at this point
regex help required to grab links from hrefs

Hi,

I am very new to PHP and have ben doing quite well (if I do say so myself) but I have hit a bit of a brick wall with regular expressions and wondered if anyone could point me in the right direction...

What I am trying to do is strip the url and content of an href into two separate variables, so

PHP Code:
$link "<a href=\"/media/press_releases/\">News &amp; Press Releases</a>" 
would end up as two variables...
PHP Code:
$url "/media/press_releases/";
$title "News &amp; Press Releases"
I assume I'm going to need to use preg_match, but my knowledge of regex's is abysmal and I'm not really understanding them properly.

Any help that can be given would be very much appreciated.

Thank you,

dudey
dudey is offline   Reply With Quote
Old 08-28-2006, 02:06 PM   PM User | #2
chump2877
Senior Coder

 
chump2877's Avatar
 
Join Date: Dec 2004
Location: the U.S. of freakin' A.
Posts: 2,530
Thanks: 15
Thanked 128 Times in 121 Posts
chump2877 is on a distinguished road
I'm sure there's a way to combine this into one regex, and I'm not the king of regex either, but this would work, i think:

PHP Code:
$link "<a href=\"/media/press_releases/\">News &amp; Press Releases</a>"
preg_match('(\b[a-zA-Z0-9]+://[^( |\>)]+\b)',$link,$matches);
preg_match('/(<a)(.+?)(>)(.+?)(<\/a>)/',$link,$matches2);

echo 
"URL is: ".$matches[0]."<br>";
echo 
"Title is: ".$matches2[4]; 
__________________
Regards, R.J.

---------------------------------------------------------

Help spread the word! Like my YouTube-to-Mp3 Web Conversion Software on Facebook !! :)
chump2877 is offline   Reply With Quote
Old 08-28-2006, 02:18 PM   PM User | #3
dudey
New to the CF scene

 
Join Date: Aug 2006
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
dudey is an unknown quantity at this point
Hmmm, almost works ... apart from the url bit.

Can you possibly explain to me what each part of your script is actually doing so that I might have a play with it (with hopefully a little understanding), for instance why specific part of the array?

Thanks for your help.

dudey
dudey is offline   Reply With Quote
Old 08-28-2006, 02:24 PM   PM User | #4
dudey
New to the CF scene

 
Join Date: Aug 2006
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
dudey is an unknown quantity at this point
Ah, I see ... I think...
PHP Code:
preg_match('/(<a)(.+?)(>)(.+?)(<\/a>)/',$link,$matches2);
echo 
"Title is: ".$matches2[4]; 
the [4] is getting the fourth instance of whatever is between the brackets ... so in this case it would be the content between the closing > of the opening tag and the end tag of </a> ... so I guess '.+?' means 'any character' ?
dudey is offline   Reply With Quote
Old 08-28-2006, 02:41 PM   PM User | #5
dudey
New to the CF scene

 
Join Date: Aug 2006
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
dudey is an unknown quantity at this point
still can't seem to get the url part of it to work though ... any ideas?

Thanks for your help so far

dudey
dudey is offline   Reply With Quote
Old 08-28-2006, 02:50 PM   PM User | #6
chump2877
Senior Coder

 
chump2877's Avatar
 
Join Date: Dec 2004
Location: the U.S. of freakin' A.
Posts: 2,530
Thanks: 15
Thanked 128 Times in 121 Posts
chump2877 is on a distinguished road
PHP Code:
<?
 
$link 
'<a href="/media/press_releases/">News &amp; Press Releases</a>';
preg_match('/(<a)(.*?)(href="|href=\')(.+?)("|\')(.*?)(>)([^<>]+?)(<\/a>)/i',$link,$matches);
 
echo 
"URL is: ".$matches[4]."<br>";
echo 
"Title is: ".$matches[8];
 
?>
give that a whirl...i just fooled around with it some more...but it may not be perfect for finding all URLs though...

The "URL" is matching whatever the forth parenthesized pattern found...the "Title" is matching whatever the eighth parenthesized pattern found....

For the URL: (.+?) is any character repeated one or more times, but don;t be so "greedy" as to include the (") character as well....

similar logic for the Title...

you can also try looking here: http://www.codingforums.com/showthread.php?t=76949

Edit: Added a couple of things to my regex pattern...
__________________
Regards, R.J.

---------------------------------------------------------

Help spread the word! Like my YouTube-to-Mp3 Web Conversion Software on Facebook !! :)

Last edited by chump2877; 08-28-2006 at 03:18 PM..
chump2877 is offline   Reply With Quote
Old 08-28-2006, 02:56 PM   PM User | #7
dudey
New to the CF scene

 
Join Date: Aug 2006
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
dudey is an unknown quantity at this point
Great stuff.
Thanks very much, it is much appreciated.

dudey
dudey is offline   Reply With Quote
Old 08-28-2006, 03:05 PM   PM User | #8
chump2877
Senior Coder

 
chump2877's Avatar
 
Join Date: Dec 2004
Location: the U.S. of freakin' A.
Posts: 2,530
Thanks: 15
Thanked 128 Times in 121 Posts
chump2877 is on a distinguished road
...
__________________
Regards, R.J.

---------------------------------------------------------

Help spread the word! Like my YouTube-to-Mp3 Web Conversion Software on Facebook !! :)
chump2877 is offline   Reply With Quote
Reply

Bookmarks

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 01:14 AM.


Advertisement
Log in to turn off these ads.