View Full Version : Regex help with search script

12-30-2003, 08:38 PM
I've got the following code in my search script...

$query = strtolower($query);
$query = preg_replace("/[^a-zą-’0-9 +!-]/"," ",$query);
$query_arr_dum = preg_split("/\s+/",$query);

$query is the text entered from the user on the search form
$query_arr_dum is an array which is create by seperating each word on the query string (by using a space (\s) as a delimiter. This array is used later on for matching words in the pages in the site.

What I would like to do is, if someone puts a phrase in quotes, that is treated as one work.

right now if you enter
new red house // without quotes

you will get results from pages that contain the word "new" and the word "red" and the work "house" but not necessarily right next to each other.

What I want to do, is have it so if you enter
new "red house"
you will get pages that match the word "new" and the phrase "red house"

How can I make that happen?

12-30-2003, 09:33 PM
Parse the search query in two steps. First, find all occurences of "quoted" phrases, store them temporarily, and delete them from the search query. Next, split the search query as you have done before.

For the first task preg_replace_callback() is quite handy. Here's a sample code that illustrates the required procedure:

$quoted = array();

function findQuoted($matches) {
global $quoted;
$quoted[] = str_replace('"', '', $matches[1]);
return '';

$input = 'new "red house" and "a combined string"';

// first, find quoted phrases, and delete them from input string
$input = preg_replace_callback('/"(.+?)"/', 'findQuoted', $input);

// next, split the remaining input string
$tokens = preg_split('/\s+/', $input, -1, PREG_SPLIT_NO_EMPTY);

// merge both subresults
$allTokens = array_merge($quoted, $tokens);

// debug output

That's just a quick hack and not very throughly tested, but should help you get started. Feel free to ask for clarifications if my line of thought and the implementing code seem obscure. :)

12-31-2003, 01:27 PM
Very nice and easier than I could've imagine...!
This forum has very buggy [ php ] [/ php ] bbcode, witch strip all backslashes in the code! Yes, its nice to see highlighted code, but when it become useless, its better stick with old-fashion one [ code ] [/ code ].
in your example the line
$tokens = preg_split('/\s+/', $input, -1, PREG_SPLIT_NO_EMPTY);should actualy be:
$tokens = preg_split('/\\\s+/', $input, -1, PREG_SPLIT_NO_EMPTY);NOTE: I had to use '/\\\s+/' so it would actualy showed in the reply.

12-31-2003, 04:49 PM
Thanks V@no for proofreading, I should have known better. In the past I've repeatedly dealt with those disappearing backslashes. With regexes, it's most often better to mark up the text with [ code ][/ code ], instead of [ php ][/ php ]. It's a bug in the forum software.