...

View Full Version : extracting information from an <a> tag using regexp



mrjamin
04-15-2004, 10:12 PM
Ok, here's the dealio

I have a string which has a load of <a href="http://domain.tld" title="detailed description">link text</a> style things in it, one per line

How would I extract:

1) the URL
2) the value of title attribute
3) the link text

I figured regular expressions are the way to go, but I'm a little confused on where to start!

Any pointers? I came up with this:


<?php
function extractLink($link) {
$link = split("\n",trim($link));
for($i = 0; isset($link[$i]); $i++){
$link[$i] = explode("\"",$link[$i]);
$link[$i]['url'] = substr($link[$i][1],7);
$link[$i]['description'] = $link[$i][3];
$link[$i]['title'] = substr($link[$i][4],1);
$link[$i]['title'] = strrev(substr(strrev($link[$i]['title']),4));
}
for($i = 0; isset($link[$i]); $i++){
foreach($link[$i] as $key => $value){
if(is_numeric($key)){
unset($link[$i][$key]);
} else {
$link[$i][$key] = htmlentities($value);
}
}
}
return $link;
}
?>

Which, while crude, does the job but it'd get messed up if there is no title attribute.

Thanks in advance,

MrJ

sad69
04-15-2004, 11:00 PM
It would be better if we saw the actual string, that way we can help you from the start.

Also, since the algorithm is completely dependent on the string input, it is imperative that we see that string.

Sadiq.

bcarl314
04-15-2004, 11:27 PM
Hmm, just guessing here, but try...



preg_match_all("/<a\s{1,2}href=\"(.*?)\"\s{1,2}title=\"(.*?)\">(.*?)<\/a>/",$string,$matches);
print_r($matches);


I think that should work, mordred could probably clean it up a bit.

I think the href will be in the matches[1], title in the matches[2] array and text in the matches[3] array. But i didn't test it. That's why I've got the print_r which will recursively print the matches array.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum