Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 5 of 5
  1. #1
    New to the CF scene
    Join Date
    Apr 2008
    Posts
    9
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Question regexp not working!

    Hi, I have a problem where I am stuck now, I read the docs but still cannot spot were the error is in my regexp:
    I want to replace the HTML list item and link from post-content for migration to a new CMS.

    Code:
    $languageHistoricContent = array (
      'de' => 'Blick in die Vergangenheit',
      'en' => 'A glimpse @ the past',
      'fr' => "Un coup d'œil sur le passé",
      'nl' => 'Terugblik',
      'es' => 'Hace un año - Hace dos años',
      'it' => 'Un anno fa / 2 anni fa',
    //  'pt-pt' => '',
    //  'hr' => '',
    );
    
    /*
    print_r( $languageHistoricContent );
    echo "LangId: '$langId'\n<br>";
    */
    
    // WAS: $searchString = "#(<li[^>]*>.*?{$languageHistoricContent['$langId']}.*?</li>)#isU";
    
    $searchString = "#(<li>.*?" . $languageHistoricContent[$langId] . ".*?</li>)#isU";
    echo "DEBUG: Search-String '$searchString'\n<br>";
    
    
    $countMatches = preg_match_all( $searchString, $postContent, $matches );
    
    // $count = preg_match_all( '#(<p[^>]*>.*</p>)#isU', $postContent, $matches );
      echo "Count of hits '$countMatches' on " . $languageHistoricContent['$langId'] . "'\n<br>";
    print_r( $matches );
    where the post-contents contains HTML like this:

    Code:
    <ul>
    <li><a href="#1">
    title1</a></li>
    <li><a href="#2">
    Australia: Title2</a></li>
    <li><a href="#10">
    News in brief
    </a></li><li><a href="#20">
    A glimpse @ the past
    </a></li></ul>
    
    <p><h3><a name="1">
    Problem is, it matches the COMPLETE LIST, although I specified non-greedy matching.

    Why it does not work? Can anybody spot the error or point me into the right direction?

  • #2
    Regular Coder
    Join Date
    Mar 2011
    Posts
    148
    Thanks
    0
    Thanked 20 Times in 20 Posts
    Hi,
    From what i know, the '<' and '>' are special characters that must be escaped in regexp code, with "\" .
    I'm not shure if with only that the code will work, but maybe it solve an error.

  • #3
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,987
    Thanks
    4
    Thanked 2,660 Times in 2,629 Posts
    No it doesn't; it only pulls from <li> through </li> where the text you are looking for is found. This is correct to the pattern you have specified.

  • #4
    New to the CF scene
    Join Date
    Apr 2008
    Posts
    9
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Hi, AFAIK the escaping for the '<' and '>' chars is not necessary, I tried it and it does not change results. Strange that it matches all li-elements at once, even non-greedy is specified.

  • #5
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,987
    Thanks
    4
    Thanked 2,660 Times in 2,629 Posts
    Quote Originally Posted by Transformer View Post
    Hi, AFAIK the escaping for the '<' and '>' chars is not necessary, I tried it and it does not change results. Strange that it matches all li-elements at once, even non-greedy is specified.
    Missed that post; yes < and > carry no special meaning in PCRE.
    I still don't know what you are talking about in regards to your li matching with greedyness. Given the pattern you have here, that is correct; it will not take the inner li since you have specified it can take any .* before and after the desired phrase, and that includes other li. You need to refine your pattern by either not allowing <li> within the <li> to match, or by indicating that an <a> exists.


  •  

    Tags for this Thread

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •