Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Page 1 of 2 12 LastLast
Results 1 to 15 of 20
  1. #1
    Senior Coder
    Join Date
    Apr 2005
    Posts
    1,051
    Thanks
    0
    Thanked 0 Times in 0 Posts

    preg_match_all(find all urls on the page)

    http://rlemon.com/TESTING/test_url_search.php <- temp url to the script.

    the script itself is a simple preg_match_all however i have come up with a nice piece of REGEX that will return all urls on the page. great for parsing say, forum posts......

    here is the expression:
    Code:
    (\b[a-zA-Z0-9]+://[^( |\>)]+\b)
    it can be used like:

    PHP Code:

    $subject 
    file_get_contents("./path/to/file.html"); // any string
    $search '(\b[a-zA-Z0-9]+://[^( |\>)]+\b)';

    preg_match_all($search$subject$matches);

    print_r($matches); 
    public string ConjunctionJunction(string words, string phrases, string clauses)
    {
    return (String)(words + phrases + clauses);
    }
    <--- Was I Helpfull? Let me know ---<

  • #2
    Super Moderator
    Join Date
    May 2002
    Location
    Perth Australia
    Posts
    4,040
    Thanks
    10
    Thanked 92 Times in 90 Posts

    Regex || regular expressions (postem here)

    Regex scares (with few exceptions) the best and worst of us, lots of regex resources on the web do not always translate directly to PHP, though most do.

    Anyway , if you have a good/useful PHP regular expression please post it here and do not start a seperate thread unless your snippet is more involved.

    Please don't simply post the regex/pattern.
    Give at least 1 example of it in action (if you don't the post gets removed~)

    A good start from rlemon follows.... errr OK I stuffed up the thread merge so the snippet preceeds !
    Last edited by firepages; 01-14-2006 at 04:36 AM. Reason: cleaning up
    resistance is...

    MVC is the current buzz in web application architectures. It comes from event-driven desktop application design and doesn't fit into web application design very well. But luckily nobody really knows what MVC means, so we can call our presentation layer separation mechanism MVC and move on. (Rasmus Lerdorf)

  • #3
    Senior Coder
    Join Date
    Aug 2003
    Location
    One step ahead of you.
    Posts
    2,815
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Even your rlemon.com example shows some errors:
    Code:
    Array
    (
        [0] => Array
            (
                [0] => http://www.rlemon.com
                [1] => http://rlemon.org<br
                [2] => ftp://userNAme@remon.com
                [3] => ftp://userName.pass@rlemon.net</a
            )
    
    )
    I'm not sure if this was any help, but I hope it didn't make you stupider.

    Experience is something you get just after you really need it.
    PHP Installation Guide Feedback welcome.

  • #4
    Regular Coder ralph l mayo's Avatar
    Join Date
    Nov 2005
    Posts
    951
    Thanks
    1
    Thanked 31 Times in 29 Posts
    Here's an URL match that passes the rlemon test. Still not perfect, but workable.
    PHP Code:
    preg_match_all('/([a-zA-Z]{2,8}:\/\/[a-zA-Z0-9\.\-@]{2,}\.[a-zA-Z0-9]{1,6}.*?)([^a-zA-Z0-9%@\.\/\?&=]|$)/'$text$matches); 
    The important results will be in $matches[1]
    Last edited by ralph l mayo; 01-14-2006 at 08:40 PM.

  • #5
    Senior Coder
    Join Date
    Apr 2005
    Posts
    1,051
    Thanks
    0
    Thanked 0 Times in 0 Posts
    my expression passed in regexCoach(3rd party app) and onscreen (not viewing source in FF) it appeared to work, hencewhy i posted - i'm reviewing the expression tomorrow (i don't code on weekends anymore :P) and hopefully i can fix it.
    public string ConjunctionJunction(string words, string phrases, string clauses)
    {
    return (String)(words + phrases + clauses);
    }
    <--- Was I Helpfull? Let me know ---<

  • #6
    Regular Coder ralph l mayo's Avatar
    Join Date
    Nov 2005
    Posts
    951
    Thanks
    1
    Thanked 31 Times in 29 Posts
    I actually found this one in O'Rly's "Perl Best Practices" (not as an example of Perl best practices, to be fair) attributed to someone who goes by the name Abigail who I guess is famous in certain of the nerdier circles, and after nearly breaking my brain trying to figure out how it works I decided to share this method for determining if a number is prime:

    PHP Code:
    function isPrime($n)
    {
        return 
    is_int($n) && !preg_match('/^ (?: 1? | (11+?) \1+) $/xms'str_repeat('1'$n));

    Brilliant and horrible, like so much of the canonical perl idiom.

    edit: An example application, rather useless but demonstrative:
    PHP Code:
    # Prints all primes in the loop range
    for ($i 0$i 1000; ++$i)
    {
        if (
    isPrime($i))
        {
            echo 
    "$i\n";
        }

    Last edited by ralph l mayo; 06-05-2006 at 09:35 AM.

  • #7
    Regular Coder
    Join Date
    May 2006
    Location
    Wales
    Posts
    820
    Thanks
    1
    Thanked 82 Times in 79 Posts
    I found this tutorial very useful in writing regex - http://www.regular-expressions.info/

  • #8
    New Coder
    Join Date
    Jun 2006
    Location
    USA
    Posts
    66
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Lol, regex started out as my worst enemy, and still remains challenging, yet I definitely enjoy it more now! When I first saw the quote in my sig, I found it very appropriate!

    Here's my URL parsing regex from developing my bbCode interpreter:
    PHP Code:
    $re "#[a-z]+?://[^<>\"\s]*[^\s.!?<>#@()\"]#i"
    I try my best not to match any characters that might directly follow the URL.

    Concerning the isPrime regex, that thing has practically made my nose bleed! It looks like they're using conditional regex, but I still don't follow the reasoning. :-|
    "Some people, when confronted with a problem, think, 'I know, I'll use regular expressions.' Now they have two problems."
    --Jamie Zawinski

  • #9
    Regular Coder ralph l mayo's Avatar
    Join Date
    Nov 2005
    Posts
    951
    Thanks
    1
    Thanked 31 Times in 29 Posts
    Your regex has issues, Curtis. The most glaring is that the dot in the final character class has special meaning. It's a wildcard, matching either any character or any character besides newlines depending on the trailing flags, which really makes a lot of difference in the end meaning. You can use the preg_quote() function on character classes you would like to be treated literally to ensure special regex rules are bypassed. Eg:

    PHP Code:
    echo preg_quote(".!?<>#@()\"");
    # Output is: \.\!\?\<\>#@\(\)" 

  • #10
    New Coder
    Join Date
    Jun 2006
    Location
    USA
    Posts
    66
    Thanks
    0
    Thanked 0 Times in 0 Posts
    The dot meta-character, in fact, has no special meaning in character classes, according to the following test.
    PHP Code:
    <?php
    echo preg_match("/[.]/"'a') ? 'Match' 'Not found'// OUTPUTS: Not found
    echo preg_match("/[.]/"'.') ? 'Match' 'Not found'// OUTPUTS: Match
    ?>
    Also, the following is from the PHP manual on Pattern Syntax
    Quote Originally Posted by PHP.net
    Part of a pattern that is in square brackets is called a "character class". In a character class the only meta-characters are:

    \

    general escape character
    ^

    negate the class, but only if the first character
    -

    indicates character range
    ]

    terminates the character class
    The dot meta-character is not listed.

    However, I did spot an error in my code. I didn't escape the regex delimiter, in this case, #. I overlooked that it needed to be escaped. I just tested this, and several variations.
    PHP Code:
    $re "#[a-z]+?://[^<>\"\s]*[^\s.!?<>\#@()\"]#i";
    $string 'Here\'s a link: <http://www.google.com>. Some more words.';
    echo 
    preg_replace($re'<a href="$0" title="Link!">$0</a>'$string); 
    Last edited by Curtis D; 06-14-2006 at 09:43 AM.
    "Some people, when confronted with a problem, think, 'I know, I'll use regular expressions.' Now they have two problems."
    --Jamie Zawinski

  • #11
    Regular Coder ralph l mayo's Avatar
    Join Date
    Nov 2005
    Posts
    951
    Thanks
    1
    Thanked 31 Times in 29 Posts
    You're right, my bad.

  • #12
    New Coder
    Join Date
    Jun 2006
    Location
    USA
    Posts
    66
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Smile E-Mail Validation Regex

    Quote Originally Posted by ralph l mayo
    You're right, my bad.
    Lol! I was scared for a bit, and triple checked

    I hope I'm not opening Pandora's box here, but I figured I might share my E-Mail validation regex. It's pretty lengthy (maybe even unnecessary??), but I have yet to spot a hole in it, although I only subjected it to the different E-Mail formats I knew. Lol, here she is (modified to ignore whitespace for readability):
    Code:
    /^
       # user name
       (?:[a-z0-9_-]+?\.)*?
       [a-z0-9_-]+?
       # separates user from domain
       @
       # sub.domain(s); if present
       (?:[a-z0-9_-]+?\.)*?
       # domain portion before TLD
       [a-z0-9_-]+?
       # dot before TLD
       \.
       # TLD match
       [a-z0-9]{2,5}
    $/ix
    Note: (?: ... ) is for non-capturing matching.

    Here's a possible application for this regex:
    PHP Code:
    <?php
    $email 
    'foo.bar.trekkie@somedomain.com';
    echo 
    'This E-Mail is <strong>' . (checkEmail($email) ? 'valid' 'not valid') . '</strong>';

    // Create function
    function checkEmail($email) {
       
    $re "/^(?:[a-z0-9_-]+?\.)*?[a-z0-9_-]+?@(?:[a-z0-9_-]+?\.)*?[a-z0-9_-]+?\.[a-z0-9]{2,5}$/i";
       return 
    preg_match($re$email); // returns true on match, false on failure
    }
    ?>
    You could expand this so that it captures each portion of an E-Mail, for whatever reason. See php.net's PCRE manual page for more info on different functions to use.
    Last edited by Curtis D; 06-15-2006 at 02:54 AM.
    "Some people, when confronted with a problem, think, 'I know, I'll use regular expressions.' Now they have two problems."
    --Jamie Zawinski

  • #13
    Senior Coder
    Join Date
    Aug 2003
    Location
    One step ahead of you.
    Posts
    2,815
    Thanks
    0
    Thanked 3 Times in 3 Posts
    This is somthing I made becouse I was a little bored. It should match any valid URL. That implies that it may match some invalid ones but I've tried to reduce the number of false matches to a minimum.
    PHP Code:
    <?php
    $regex 
    '~(?>[a-z+]{2,}://|www\.)(?:[a-z0-9]+(?:\.[a-z0-9]+)?@)?(?:(?:[a-z](?:[a-z0-9]|(?<!-)-)*[a-z0-9])(?:\.[a-z](?:[a-z0-9]|(?<!-)-)*[a-z0-9])+|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?:/[^\\/:?*"<>|\n]*[a-z0-9])*/?(?:\?[a-z0-9_.%]+(?:=[a-z0-9_.%:/+-]*)?(?:&[a-z0-9_.%]+(?:=[a-z0-9_.%:/+-]*)?)*)?(?:#[a-z0-9_%.]+)?~i';
    ?>
    Sample URLs it matches:
    Code:
    http://www.php.net/manual/en/language.types.string.php#language.types.string.conversion
    http://www.google.com/search?client=opera&rls=en&q=sample&sourceid=&ie=utf-8&oe=utf-8
    http://66.249.93.104/search?q=cache:bCQPHS_h08gJ:stardust.jpl.nasa.gov/+sample&hl=en&ct=clnk&cd=4&client=
    http://zdrowie.onet.pl/1340860,2039,0,1,,ortoreksja_czyli_obsesja,profilaktyka.html
    https://www.rlemon.com
    http://rlemon.org
    ftp://userNAme@remon.com
    ftps://userName.pass@rlemon.net
    www.codingforums.com (lazy "www." match instead of protocol)
    svn+ssh://something.net/repository
    The regex won't match invalid domains or ip addresses (or shouldn't match them).
    I bet there is some error there
    I'm not sure if this was any help, but I hope it didn't make you stupider.

    Experience is something you get just after you really need it.
    PHP Installation Guide Feedback welcome.

  • #14
    New Coder
    Join Date
    Jun 2006
    Location
    USA
    Posts
    66
    Thanks
    0
    Thanked 0 Times in 0 Posts
    LOL!! You're still alive after that!! I don't think I could read through it all without separating bits of it (using /x modifier). That's very awesome! I can already see you thought about URLs that never once entered my mind.
    "Some people, when confronted with a problem, think, 'I know, I'll use regular expressions.' Now they have two problems."
    --Jamie Zawinski

  • #15
    Senior Coder
    Join Date
    Oct 2003
    Location
    Australia
    Posts
    1,963
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Curtis D
    I hope I'm not opening Pandora's box here, but I figured I might share my E-Mail validation regex. It's pretty lengthy (maybe even unnecessary??), but I have yet to spot a hole in it, although I only subjected it to the different E-Mail formats I knew.
    Not intending to put down your work at all, but email validity regex is notoriously difficult because most people write them the same way you did, ie: "I only subjected it to the different E-Mail formats I knew". The Internet Message Standard (RFC 2822) outlines what truly is a valid email address.
    For an email regex that conforms to RFC 2822 and relevant discussion on the matter, Check out this blog post

    I take no responsibility for the above nonsense.


    Left Justified


  •  
    Page 1 of 2 12 LastLast

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •