Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 4 of 4

Thread: Regex help!

  1. #1
    Regular Coder
    Join Date
    Jan 2009
    Location
    Norway
    Posts
    118
    Thanks
    8
    Thanked 2 Times in 2 Posts

    Regex help!

    Hello!

    I'm currently practicing with using PHP DOMDocument and I've come across an issue that I just can't solve on my own

    I've managed to parse some data from Wikipedia's special page, but I've got a lot of characters around my data which I really don't need.

    This is a sample of the output I'm getting:
    *''[[10 Things I Hate About You (TV series)|10 Things I Hate About You]]''

    My desired result:
    10 Things I Hate About You (TV series)

    If anyone could help me out with a regular expression as to how to get such a result and also explain the regex pattern to me I would be forever grateful!

    Sincerely
    Cyb

  • #2
    New to the CF scene
    Join Date
    Aug 2012
    Posts
    2
    Thanks
    0
    Thanked 1 Time in 1 Post
    Hi CyberPirate,
    the regex pattern I'd suggest is \*\'\'\[\[(.*?)\|(.*?)\]\]\'\'

    PHP Code:
        $string "*''[[10 Things I Hate About You (TV series)|10 Things I Hate About You]]''";
        
    preg_match('/\*\'\'\[\[(.*?)\|(.*?)\]\]\'\'/',$string,$m);
        
    print_r($m); 
    It's probably not the best solution, but it works.

    Assuming all of your parsed output is in a similar format, then it will always be something like *''[[something here|something here]]'' . So, as *, [ and ] are used by the regex engine for other things you need to escape them. I find it better to escape quotes as well just to be safe (as you can see in the example, I used single quotes to enclose the regex so in the example not escaping the quotes would cause a syntax error).

    Next up is the (.*?) what this does it match anything (.) infinite times (*) until it hits |, which is because of the ungreedy operator (?). You could put .*? on its own without brackets, but then the result wouldn't go to preg_match. Preg_match returns the whole match in its output (in this case $m)'s index 0, and then each subsequent bracket in its own index.

    So the output from the print_r is :
    Code:
    Array
    (
        [0] => *''[[10 Things I Hate About You (TV series)|10 Things I Hate About You]]''
        [1] => 10 Things I Hate About You (TV series)
        [2] => 10 Things I Hate About You
    )
    Which would leave the part you want in [1], but give you the option to use the other entry if you wanted to.

    Hope this helps.

  • Users who have thanked rerryn for this post:

    CyberPirate (08-14-2012)

  • #3
    Regular Coder
    Join Date
    Jan 2009
    Location
    Norway
    Posts
    118
    Thanks
    8
    Thanked 2 Times in 2 Posts
    Worked like a charm!

    Thanks +1

  • #4
    New to the CF scene
    Join Date
    Aug 2012
    Posts
    9
    Thanks
    1
    Thanked 0 Times in 0 Posts
    PHP Code:
    <?php
    $text
    ="*''[[10 Things I Hate About You (TV series)|10 Things I Hate About You]]''";

    $text=preg_match('/\[\[(.*)\|.*\]\]/i',$text,$matches);

    print_r($matches[1]);
    ?>


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •