Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 5 of 5
  1. #1
    Senior Coder NancyJ's Avatar
    Join Date
    Feb 2005
    Location
    Bradford, UK
    Posts
    3,172
    Thanks
    19
    Thanked 65 Times in 64 Posts

    preg_match_all - problem

    I'm trying to extra all the links in an html document. I've tested the regex at regexlib.com and it works fine there but this code finds no matches:

    Code:
    $matches=array();
    preg_match_all('|<a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a>|',$this->content,$matches);
    print_r($matches);

  • #2
    Senior Coder kbluhm's Avatar
    Join Date
    Apr 2007
    Location
    Philadelphia, PA, USA
    Posts
    1,509
    Thanks
    3
    Thanked 258 Times in 254 Posts
    Give this a shot:
    PHP Code:
    preg_match_all'/\<a.*href.*\=.*("|\')(.*)\1.*\>(.*)\<\/a\>/Usi'$this->content$matches );

    $matches[0]; // each whole match
    $matches[1]; // each quote-type
    $matches[2]; // each link's href value
    $matches[3]; // each link's text 
    Wrote it a bit hastily, but appears to work with well-formed or crap-formed HTML.
    Last edited by kbluhm; 09-10-2009 at 10:46 PM.

  • #3
    Senior Coder NancyJ's Avatar
    Join Date
    Feb 2005
    Location
    Bradford, UK
    Posts
    3,172
    Thanks
    19
    Thanked 65 Times in 64 Posts
    That seems to work fine, but why didn't mine work?

  • #4
    Senior Coder
    Join Date
    Aug 2009
    Location
    Mansfield, Nottinghamshire, UK
    Posts
    1,555
    Thanks
    57
    Thanked 148 Times in 147 Posts
    '/regexp/Usi'

    pattern modifiers:

    i - If this modifier is set, letters in the pattern match both upper and lower case letters.

    m - When this modifier is set, the "start of line" and "end of line" constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end.

    s - If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines.

    x - If this modifier is set, whitespace data characters in the pattern are totally ignored except when escaped or inside a character class, and characters between an unescaped # outside a character class and the next newline character, inclusive, are also ignored.

    A - If this modifier is set, the pattern is forced to be "anchored", that is, it is constrained to match only at the start of the string which is being searched (the "subject string").

    D - If this modifier is set, a dollar metacharacter in the pattern matches only at the end of the subject string.

    U - This modifier inverts the "greediness" of the quantifiers so that they are not greedy by default, but become greedy if followed by "?".

  • #5
    Senior Coder NancyJ's Avatar
    Join Date
    Feb 2005
    Location
    Bradford, UK
    Posts
    3,172
    Thanks
    19
    Thanked 65 Times in 64 Posts
    I tried mine with U with i and with Usi but it didn't work. si would be the closest to the settings in regex tester but there were no new lines within any of the 'matches'.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •