...

View Full Version : Preg help



Ultragames
07-08-2007, 02:58 AM
I need to get a number from within a string, when surrounded by *'s, even if other numbers are present.

So if my string is
'99_%y_%M_*001*'
I need to have a preg function that returns the 001, and nothing else.

any help?

Ultragames
07-09-2007, 12:31 AM
*bump*

I really need this ASAP. Can anyone help me to make a regex that will let me pull that section out? Again i just need any number of digits between the stars.

Thanks!

ralph l mayo
07-09-2007, 01:12 AM
/ (\d*?) /xms

Serex
07-09-2007, 01:17 AM
/boggle

providing there are only one set of **



$str = "99_%y_%M_*001*";
$str_x = explode("*", $str);

echo $str_x[1];

Ultragames
07-09-2007, 03:43 AM
ralph l mayo, Thank you!

Questions:
What in that regex makes regex know that the *'s HAVE to be present for it to return the numbers? Also, what does the /xms mean?

A general explanation will help my learning curve. THank you!

ralph l mayo
07-09-2007, 07:12 AM
means "match one * character". It's the same as \* but I think a bit easier to read. ? (or \*?) is "match an optional * character (zero or one * characters)". + means "match at least one * character" and * means "match any number (including zero) of * characters".

/xms does nothing at all in this expression, it's just a quirk of mine to always add it to regular expressions to normalize their behavior in the cases where it does make a difference. s makes . stand for every character instead of [^\n] (every character except newline), m makes ^ and $ anchor at the beginning and end of lines respectively as well as at the begging and end of the entire string, and x allows you to add inline comments (and ignores whitespace).

Ultragames
07-09-2007, 07:55 AM
Now thats the kind of info i've been needing! I've read the PHP manual on RegEx syntax, but its too much to get all at once.

One thing I noticed though, I did a Preg_Match() using your RegEx, and it returns both *001* and 001. I would prefer it to only return 001, but I dont know why its giving me both. Tips?

GJay
07-09-2007, 08:08 AM
that's just how preg_match works, the first element in the array contains the full pattern-match, and subsequent elements contain captured matches- things inside ()s. Because your pattern is relying on the context of *s, there's no way to exclude them from the whole pattern, so they will be part of the match. The bit you want will always be in $matches[1] though, so you can just ignore the 0th element.

Ultragames
07-09-2007, 08:09 AM
Good to know. You guys have been particularly helpful today. Thank you.

ralph l mayo
07-10-2007, 07:14 AM
GJay is spot-on in diagnosing preg_match's unfortunate insistence on returning the whole match as well as any capturing groups. JavaScript uses the same idiom.

In this situation it's possible in PHP to make the entire match the same as the captured match and only return one result, but the costs are that it uses regexp functionality many people aren't familiar with and that it uses functionality some regexp engines have not even bothered to implement. For example I don't think it would translate cross-browser to JavaScript. All things considered if I were writing this program I'd use the regexp I gave above because it's indicative at a glance of what you're trying to do, but there is a (very slightly) optimized form that may prove instructive since you're learning about the regexp syntax:

You can change
/ \d* /xms
to
/(?<= )\d*(?= )/xms

(?<=(exp))(exp2) means "exp2 is preceded by exp" and (exp2)(?=(exp)) means "exp2 is proceded by exp". These are different from just saying (exp)(exp2) and (exp2)(exp), respectively, in that they are "zero-width assertions". Zero-width refers to the number of characters matches: none. Succeed or fail, they match nothing, they only assert a state and return true or false depending on whether the state is satisfied by the actual string. There are some concepts that are impossible to express in regular expressions without assertions. This isn't one of them, but it helps a bit in that * is never matched by anything on either end of the string, so the return will be Array( [0] => 001 ) for the example you gave upthread.

There are some limitations imposed on the expressions inside assertions, but that's getting pretty deep in the regexp well. Generally if they work at all they have to be fixed-width, for performance and internal complexity reasons.

Also I'll add that for either expression the tail can probably be left off, ie,
\d* => \d*
(?<= )\d*(?= ) =>(?<= )\d*

because \d* stops at the first non-digit anyway, and if it's always a * in your input it's redundant to say so. It's easier to parse at a glance with the end intact, though.

Ultragames
07-10-2007, 09:39 AM
Holly crap I understood most of that. :)

I have started using regex in a lot of data verification things like removing everything but he digits and . from a string for floats, and making sure that an email address is valid etc. So while I have someone who is obviously MUCH better at regex withing my reach, do you have a solution for this:

It would be great to have a regex which in some way format a phone number for me. I want to format all phone numbers that are put in in the same way. In order to do that, I need regex to read them and tell me what part is what.

Example:
Input -> Desired Output
9991231234 -> (999) 123-1234 = Seperate and () the area code
800 123 1234 -> 1-800-123-1234 = Add a 1 to any 800 number and delimit with -
479991231234 -> +47 (999) 123-1234 = Pull out Country code, add +, and format normaly

That seems like a lot of work to do, and I think it would be a huge mess of regex and preg functions, so I haven't even tried. If this is in fact as hard as I think, don't worry about trying, but if you have done it, or think you can, I would love to have this at my disposal.

ralph l mayo
07-10-2007, 10:53 AM
Regexps turn out to be too blunt to do a lot of this stuff like validating email addresses without significant contortions. (See: http://codingforums.com/showpost.php?p=581693&postcount=4)

I suspect phone numbers may be similar but I don't really know enough, particularly about their international scheme, to guess what the edge cases are that will bite you with a naive implementation, but some probably will. Also I don't really understand why 800 numbers should look different, so I may be missing more of the problem. Here's a naive implementation anyway:



<?php
$tests = array(
'9991231234',
'19991231234',
'800 123 1234',
'479991231234'
);

foreach ($tests as &$test)
{
// Nuke all non-digits and a leading one if the result is 11 digits, to normalize the starting point
$test = preg_replace('/1(\d{10})/xms', '$1', preg_replace('/[^\d]/xms', '', $test));
switch (strlen($test))
{
case 10:
$test = preg_replace('/800 (\d{3}) (\d{4})/xms', '1-800-$1-$2', $test);
$test = preg_replace('/(\d{3}) (\d{3}) (\d{4})/xms', '($1) $2-$3', $test);
break;
case 11:
case 12:
$test = preg_replace('/\A(\d{1,2}) (\d{3}) (\d{3}) (\d{4})\z/xms', '+$1 ($2) $3-$4', $test);
break;
default:
throw new Exception('Don\'t know how to parse a phone number of ' . strlen($test) . ' digits');
}
}
?>


after which $tests looks like


Array
(
[0] => (999) 123-1234
[1] => (999) 123-1234
[2] => 1-800-123-1234
[3] => +47 (999) 123-1234
)


edit: Oh yeah, extensions! That's only somewhat solveable with more regexp because numbers of certain lengths will be ambigious without groupings. Also some area and country codes are invalid. Somebody has probably written a decent freely licensed parser that isn't as naive that I would use straightaway before messing with regexps.

Ultragames
07-10-2007, 02:31 PM
I'm not looking to start a phone company or anything. I just wanted to stop people from have one number say 9991231234 and another (999) 123 1234 on the same form.

This script works very nicely, and I see how I can build on it if I need to. Thanks! Most imformative!



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum