View Full Version : extracting information from an <a> tag using regexp

04-15-2004, 11:12 PM
Ok, here's the dealio

I have a string which has a load of <a href="http://domain.tld" title="detailed description">link text</a> style things in it, one per line

How would I extract:

1) the URL
2) the value of title attribute
3) the link text

I figured regular expressions are the way to go, but I'm a little confused on where to start!

Any pointers? I came up with this:

function extractLink($link) {
$link = split("\n",trim($link));
for($i = 0; isset($link[$i]); $i++){
$link[$i] = explode("\"",$link[$i]);
$link[$i]['url'] = substr($link[$i][1],7);
$link[$i]['description'] = $link[$i][3];
$link[$i]['title'] = substr($link[$i][4],1);
$link[$i]['title'] = strrev(substr(strrev($link[$i]['title']),4));
for($i = 0; isset($link[$i]); $i++){
foreach($link[$i] as $key => $value){
} else {
$link[$i][$key] = htmlentities($value);
return $link;

Which, while crude, does the job but it'd get messed up if there is no title attribute.

Thanks in advance,


04-16-2004, 12:00 AM
It would be better if we saw the actual string, that way we can help you from the start.

Also, since the algorithm is completely dependent on the string input, it is imperative that we see that string.


04-16-2004, 12:27 AM
Hmm, just guessing here, but try...


I think that should work, mordred could probably clean it up a bit.

I think the href will be in the matches[1], title in the matches[2] array and text in the matches[3] array. But i didn't test it. That's why I've got the print_r which will recursively print the matches array.