PDA

View Full Version : MySQL REGEXP not working


jrr
05-28-2003, 05:24 PM
I have a PHP script that uses MySQL (v3.23.53-max-nt) REGEXPs,
and these seemed to be working fine until it needed to use the
condition:

msg_text REGEXP 'status: \(failed\)'

I knew there was a record containing this string. But the script
didn't return it, until I had reduced the condition to:

msg_text REGEXP '\(failed\)'

The problem seems to be caused by the colon, despite this not
being a reserved character (as far as I can tell from the MySQL
manual).

Also, the manual says that preceding special characters by '\'
causes them to be treated as normal characters. So why in
that case does MySQL complain about 'unbalanced brackets'
when I tried running:

msg_text REGEXP '\(fail'

Luckily, I was able to work round these problems by using
a LIKE clause instead. But if anyone has an explanation for
either or both these problems, I'd be very interested.


Cheers

John R Ramsden (john_ramsden@sagitta-ps.com)

mordred
05-28-2003, 10:30 PM
No, the colon should not be the source of your problem. It is rather more likely that you tripped over one (two exactly) gotchas concerning the backslash escape character. Quoted from the manual (http://www.mysql.com/doc/en/String_comparison_functions.html):


Note: Because MySQL uses the C escape syntax in strings (for example, `\n'), you must double any `\' that you use in your REGEXP strings.


But writing your query like

msg_text REGEXP 'status: \\(failed\\)'

won't help you if this query is contained in a PHP string variable, because in this case the same effect as described in the manual annotation takes place: PHP treats the first backslash to escape the second, thus effectively only leaving one backslash in the string that get's passed to the SQL interpreter.
In order to prevent this behaviour, you have to escape every backslash already in the PHP string variable:

msg_text REGEXP 'status: \\\\(failed\\\\)'

That should, at least it worked for me (using 3.23.56 on a Win2k box). I know it looks as if your keyboard were broken, but that's how it's done, unfortunately.

jrr
05-29-2003, 09:12 AM
Thanks for the reply. So let me get this straight - there must
be _two_ levels of escaping, to account for the two hoops
through which the expression must jump before it reaches
the regular expression engine?

In other words, if the PHP code contains the line:

msg_text REGEXP 'status: \\\\(failed\\\\)'

MySQL obviously sees this as:

msg_text REGEXP 'status: \\(failed\\)'

No surprise there.

But then MySQL un-escapes it _again_ before passing
the following to the regular expression engine:

msg_text REGEXP 'status: \(failed\)'

It's that second un-escaping step I find a bit surprising.


Cheers

John R Ramsden (john_ramsden@sagitta-ps.com)

mordred
05-29-2003, 01:51 PM
I'm not sure why you find this strange. If there were only one backslash in the regexp string, than it would have the effect of trying to escape the parentheses _before_ this string is passed to the regexp machine. Obviously, parentheses have no special meaning in _ordinary_ SQL strings, so they are neglected and the regexp sees a group matching pattern.

If it helps, think of everything inside the single quotes as a string variable that works analagous to PHP strings. The regexp pattern needs one backslash. A literal backslash must be escapaed in a SQL string. All backslashes appearing in the SQL string must be escaped in the PHP string. It's like a cascade, starting from the pattern:

\(failed\)

to the SQL string:

\\(failed\\)

to the PHP string:

\\\\(failed\\\\)


hth
mordred