View Full Version : MySQL REGEXP question

03-18-2009, 05:05 PM

I have a very poorly written database struct that I must query...

The particular field that I'm interested contains "special" data. The data is as follow:


= and & are reserved characters so if they are present in a value they are escaped that way:

(%3D and %26)

Every special field have an unique name. The special field order in the database field can vary. I know this isn't the best design but I have no control over it.

Now I must get all the distinct values of a particular "field". I was thinking of using MySQL REGEXP, but I'm very bad with regular expressions.

Which MySQL regular expression would retrieve the value part of a particular special field (like "year" in my previous example)? The special field can be at the beggining of the string or at the end. It must also work correctly with things like:
(only 2001 will be found here)

Anyone can help with this regular expression? Also is there a better way than REGEXP to achieve what I want to?

Many thanks!

Old Pedant
03-19-2009, 01:33 AM
So far as I know, MySQL can't *extract* data from a field according to a regular expression; it can only use a regular expression in doing a match.

Yeah, look here:

So I think the best you can do is FIND those patterns in those fields and then use a regular expression in your implementation language (PHP, ASP, whatever) to then extract the needed value.

That's the bad news. The good news is that matching is easier.

So, for your "year=nnnn", for example, you could just use

FROM table
WHERE field RLIKE '(^year|\\&year)\\=\\d{4,4}'

Untested, but looks like it should work. Will only find "year" at the start of the field or after an & character.

03-20-2009, 03:28 PM
I managed to extract the data I wanted with :

SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT('&', my_field), '&myvalue=', -1), '&', 1) AS 'myvalue'
FROM mytable
WHERE <some cond>

Old Pedant
03-21-2009, 07:18 AM
Ahhh...I misread your problem!

You wrote "The special field order in the database field can vary," and I read that to mean that the actual field *name* varied, as well! So I kind of assumed you were going to need to be finding a set of possible field names.

Yeah, your trick with substring makes eminent sense.

10-18-2010, 05:17 PM
Can someone help me out on a variation of the same problem?

I have data in the form of "@abc blah blah blah" and "blah blah @xyz blah blah"

How can I extract "abc" and "xyz" from query???

Thanks for the help...

Old Pedant
10-18-2010, 08:09 PM
Again, in MySQL, you can only use a regexp in the WHERE clause. It's kind of useless in the SELECT.

So you would have to use ordinary string manipulation in the SELECT.

Unless it is direly important that this be done in the SQL query, I'd just use WHERE to find the records with content like this and then extact the actual strings in PHP/JSP/ASP/whatever you are using with MySQL.