09-10-2009, 10:05 PM
I'm trying to extra all the links in an html document. I've tested the regex at regexlib.com and it works fine there but this code finds no matches:


09-10-2009, 11:41 PM
Give this a shot:

preg_match_all( '/\<a.*href.*\=.*("|\')(.*)\1.*\>(.*)\<\/a\>/Usi', $this->content, $matches );

$matches[0]; // each whole match
$matches[1]; // each quote-type
$matches[2]; // each link's href value
$matches[3]; // each link's text

Wrote it a bit hastily, but appears to work with well-formed or crap-formed HTML.

09-11-2009, 09:02 AM
That seems to work fine, but why didn't mine work?

Phil Jackson
09-11-2009, 10:08 AM

pattern modifiers:

i - If this modifier is set, letters in the pattern match both upper and lower case letters.

m - When this modifier is set, the "start of line" and "end of line" constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end.

s - If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines.

x - If this modifier is set, whitespace data characters in the pattern are totally ignored except when escaped or inside a character class, and characters between an unescaped # outside a character class and the next newline character, inclusive, are also ignored.

A - If this modifier is set, the pattern is forced to be "anchored", that is, it is constrained to match only at the start of the string which is being searched (the "subject string").

D - If this modifier is set, a dollar metacharacter in the pattern matches only at the end of the subject string.

U - This modifier inverts the "greediness" of the quantifiers so that they are not greedy by default, but become greedy if followed by "?".

09-11-2009, 11:17 AM
I tried mine with U with i and with Usi but it didn't work. si would be the closest to the settings in regex tester but there were no new lines within any of the 'matches'.

