Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 13 of 13

Thread: Preg help

  1. #1
    Regular Coder
    Join Date
    Aug 2002
    Location
    Oregon, United States of America
    Posts
    882
    Thanks
    1
    Thanked 9 Times in 9 Posts

    Preg help

    I need to get a number from within a string, when surrounded by *'s, even if other numbers are present.

    So if my string is
    PHP Code:
    '99_%y_%M_*001*' 
    I need to have a preg function that returns the 001, and nothing else.

    any help?
    If I'm postin here, I NEED YOUR HELP!!

  • #2
    Regular Coder
    Join Date
    Aug 2002
    Location
    Oregon, United States of America
    Posts
    882
    Thanks
    1
    Thanked 9 Times in 9 Posts
    *bump*

    I really need this ASAP. Can anyone help me to make a regex that will let me pull that section out? Again i just need any number of digits between the stars.

    Thanks!
    If I'm postin here, I NEED YOUR HELP!!

  • #3
    Regular Coder ralph l mayo's Avatar
    Join Date
    Nov 2005
    Posts
    951
    Thanks
    1
    Thanked 31 Times in 29 Posts
    /[*](\d*?)[*]/xms

  • #4
    Regular Coder
    Join Date
    Mar 2004
    Location
    Australia
    Posts
    217
    Thanks
    0
    Thanked 1 Time in 1 Post
    /boggle

    providing there are only one set of **

    PHP Code:
    $str "99_%y_%M_*001*";
    $str_x explode("*"$str);

    echo 
    $str_x[1]; 

  • #5
    Regular Coder
    Join Date
    Aug 2002
    Location
    Oregon, United States of America
    Posts
    882
    Thanks
    1
    Thanked 9 Times in 9 Posts
    ralph l mayo, Thank you!

    Questions:
    What in that regex makes regex know that the *'s HAVE to be present for it to return the numbers? Also, what does the /xms mean?

    A general explanation will help my learning curve. THank you!
    If I'm postin here, I NEED YOUR HELP!!

  • #6
    Regular Coder ralph l mayo's Avatar
    Join Date
    Nov 2005
    Posts
    951
    Thanks
    1
    Thanked 31 Times in 29 Posts
    [*] means "match one * character". It's the same as \* but I think a bit easier to read.[*]? (or \*?) is "match an optional * character (zero or one * characters)".[*]+ means "match at least one * character" and[*]* means "match any number (including zero) of * characters".

    /xms does nothing at all in this expression, it's just a quirk of mine to always add it to regular expressions to normalize their behavior in the cases where it does make a difference. s makes . stand for every character instead of [^\n] (every character except newline), m makes ^ and $ anchor at the beginning and end of lines respectively as well as at the begging and end of the entire string, and x allows you to add inline comments (and ignores whitespace).

  • #7
    Regular Coder
    Join Date
    Aug 2002
    Location
    Oregon, United States of America
    Posts
    882
    Thanks
    1
    Thanked 9 Times in 9 Posts
    Now thats the kind of info i've been needing! I've read the PHP manual on RegEx syntax, but its too much to get all at once.

    One thing I noticed though, I did a Preg_Match() using your RegEx, and it returns both *001* and 001. I would prefer it to only return 001, but I dont know why its giving me both. Tips?
    If I'm postin here, I NEED YOUR HELP!!

  • #8
    Senior Coder
    Join Date
    Sep 2005
    Posts
    1,791
    Thanks
    5
    Thanked 36 Times in 35 Posts
    that's just how preg_match works, the first element in the array contains the full pattern-match, and subsequent elements contain captured matches- things inside ()s. Because your pattern is relying on the context of *s, there's no way to exclude them from the whole pattern, so they will be part of the match. The bit you want will always be in $matches[1] though, so you can just ignore the 0th element.
    My thoughts on some things: http://codemeetsmusic.com
    And my scrapbook of cool things: http://gjones.tumblr.com

  • #9
    Regular Coder
    Join Date
    Aug 2002
    Location
    Oregon, United States of America
    Posts
    882
    Thanks
    1
    Thanked 9 Times in 9 Posts
    Good to know. You guys have been particularly helpful today. Thank you.
    If I'm postin here, I NEED YOUR HELP!!

  • #10
    Regular Coder ralph l mayo's Avatar
    Join Date
    Nov 2005
    Posts
    951
    Thanks
    1
    Thanked 31 Times in 29 Posts
    GJay is spot-on in diagnosing preg_match's unfortunate insistence on returning the whole match as well as any capturing groups. JavaScript uses the same idiom.

    In this situation it's possible in PHP to make the entire match the same as the captured match and only return one result, but the costs are that it uses regexp functionality many people aren't familiar with and that it uses functionality some regexp engines have not even bothered to implement. For example I don't think it would translate cross-browser to JavaScript. All things considered if I were writing this program I'd use the regexp I gave above because it's indicative at a glance of what you're trying to do, but there is a (very slightly) optimized form that may prove instructive since you're learning about the regexp syntax:

    You can change
    /[*]\d*[*]/xms
    to
    /(?<=[*])\d*(?=[*])/xms

    (?<=(exp))(exp2) means "exp2 is preceded by exp" and (exp2)(?=(exp)) means "exp2 is proceded by exp". These are different from just saying (exp)(exp2) and (exp2)(exp), respectively, in that they are "zero-width assertions". Zero-width refers to the number of characters matches: none. Succeed or fail, they match nothing, they only assert a state and return true or false depending on whether the state is satisfied by the actual string. There are some concepts that are impossible to express in regular expressions without assertions. This isn't one of them, but it helps a bit in that * is never matched by anything on either end of the string, so the return will be Array( [0] => 001 ) for the example you gave upthread.

    There are some limitations imposed on the expressions inside assertions, but that's getting pretty deep in the regexp well. Generally if they work at all they have to be fixed-width, for performance and internal complexity reasons.

    Also I'll add that for either expression the tail can probably be left off, ie, [*]\d*[*] =>[*]\d*
    (?<=[*])\d*(?=[*]) =>(?<=[*])\d*

    because \d* stops at the first non-digit anyway, and if it's always a * in your input it's redundant to say so. It's easier to parse at a glance with the end intact, though.

  • #11
    Regular Coder
    Join Date
    Aug 2002
    Location
    Oregon, United States of America
    Posts
    882
    Thanks
    1
    Thanked 9 Times in 9 Posts
    Holly crap I understood most of that.

    I have started using regex in a lot of data verification things like removing everything but he digits and . from a string for floats, and making sure that an email address is valid etc. So while I have someone who is obviously MUCH better at regex withing my reach, do you have a solution for this:

    It would be great to have a regex which in some way format a phone number for me. I want to format all phone numbers that are put in in the same way. In order to do that, I need regex to read them and tell me what part is what.

    Example:
    Input -> Desired Output
    9991231234 -> (999) 123-1234 = Seperate and () the area code
    800 123 1234 -> 1-800-123-1234 = Add a 1 to any 800 number and delimit with -
    479991231234 -> +47 (999) 123-1234 = Pull out Country code, add +, and format normaly

    That seems like a lot of work to do, and I think it would be a huge mess of regex and preg functions, so I haven't even tried. If this is in fact as hard as I think, don't worry about trying, but if you have done it, or think you can, I would love to have this at my disposal.
    If I'm postin here, I NEED YOUR HELP!!

  • #12
    Regular Coder ralph l mayo's Avatar
    Join Date
    Nov 2005
    Posts
    951
    Thanks
    1
    Thanked 31 Times in 29 Posts
    Regexps turn out to be too blunt to do a lot of this stuff like validating email addresses without significant contortions. (See: http://codingforums.com/showpost.php...93&postcount=4)

    I suspect phone numbers may be similar but I don't really know enough, particularly about their international scheme, to guess what the edge cases are that will bite you with a naive implementation, but some probably will. Also I don't really understand why 800 numbers should look different, so I may be missing more of the problem. Here's a naive implementation anyway:

    PHP Code:
    <?php
    $tests 
    = array(
        
    '9991231234'
        
    '19991231234',
        
    '800 123 1234'
        
    '479991231234'
    );

    foreach (
    $tests as &$test)
    {
        
    // Nuke all non-digits and a leading one if the result is 11 digits, to normalize the starting point
        
    $test preg_replace('/1(\d{10})/xms''$1'preg_replace('/[^\d]/xms'''$test));
        switch (
    strlen($test))
        {
        case 
    10:
            
    $test preg_replace('/800 (\d{3}) (\d{4})/xms''1-800-$1-$2'$test);
            
    $test preg_replace('/(\d{3}) (\d{3}) (\d{4})/xms''($1) $2-$3'$test);
        break;
        case 
    11:
        case 
    12:
            
    $test preg_replace('/\A(\d{1,2}) (\d{3}) (\d{3}) (\d{4})\z/xms''+$1 ($2) $3-$4'$test);
        break;
        default:
            throw new 
    Exception('Don\'t know how to parse a phone number of ' strlen($test) . ' digits');
        }
    }
    ?>
    after which $tests looks like
    Code:
    Array
    (
        [0] => (999) 123-1234
        [1] => (999) 123-1234
        [2] => 1-800-123-1234
        [3] => +47 (999) 123-1234
    )
    edit: Oh yeah, extensions! That's only somewhat solveable with more regexp because numbers of certain lengths will be ambigious without groupings. Also some area and country codes are invalid. Somebody has probably written a decent freely licensed parser that isn't as naive that I would use straightaway before messing with regexps.
    Last edited by ralph l mayo; 07-10-2007 at 11:00 AM.

  • #13
    Regular Coder
    Join Date
    Aug 2002
    Location
    Oregon, United States of America
    Posts
    882
    Thanks
    1
    Thanked 9 Times in 9 Posts
    I'm not looking to start a phone company or anything. I just wanted to stop people from have one number say 9991231234 and another (999) 123 1234 on the same form.

    This script works very nicely, and I see how I can build on it if I need to. Thanks! Most imformative!
    If I'm postin here, I NEED YOUR HELP!!


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •