Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 4 of 4
  1. #1
    Mega-ultimate member
    Join Date
    Jun 2002
    Location
    Winona, MN - The land of 10,000 lakes
    Posts
    1,855
    Thanks
    1
    Thanked 45 Times in 42 Posts

    Regex help with search script

    I've got the following code in my search script...

    Code:
        $query = strtolower($query);
        $query = preg_replace("/[^a-zą-’0-9 +!-]/"," ",$query);
        $query_arr_dum = preg_split("/\s+/",$query);
    $query is the text entered from the user on the search form
    $query_arr_dum is an array which is create by seperating each word on the query string (by using a space (\s) as a delimiter. This array is used later on for matching words in the pages in the site.

    What I would like to do is, if someone puts a phrase in quotes, that is treated as one work.

    right now if you enter
    new red house // without quotes

    you will get results from pages that contain the word "new" and the word "red" and the work "house" but not necessarily right next to each other.

    What I want to do, is have it so if you enter
    new "red house"
    you will get pages that match the word "new" and the phrase "red house"

    How can I make that happen?

  • #2
    Senior Coder
    Join Date
    Jun 2002
    Location
    frankfurt, german banana republic
    Posts
    1,848
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Parse the search query in two steps. First, find all occurences of "quoted" phrases, store them temporarily, and delete them from the search query. Next, split the search query as you have done before.

    For the first task preg_replace_callback() is quite handy. Here's a sample code that illustrates the required procedure:

    PHP Code:
    $quoted = array();    
        
    function 
    findQuoted($matches) {
        global 
    $quoted;
        
    $quoted[] = str_replace('"'''$matches[1]);
        return 
    '';    
    }    

    $input 'new "red house" and "a combined string"';    

    // first, find quoted phrases, and delete them from input string
    $input preg_replace_callback('/"(.+?)"/''findQuoted'$input);

    // next, split the remaining input string
    $tokens preg_split('/\s+/'$input, -1PREG_SPLIT_NO_EMPTY);

    // merge both subresults
    $allTokens array_merge($quoted$tokens);

    // debug output
    var_dump($allTokens); 
    That's just a quick hack and not very throughly tested, but should help you get started. Feel free to ask for clarifications if my line of thought and the implementing code seem obscure. :)
    De gustibus non est disputandum.

  • #3
    Regular Coder
    Join Date
    Feb 2003
    Posts
    101
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Very nice and easier than I could've imagine...!
    This forum has very buggy [ php ] [/ php ] bbcode, witch strip all backslashes in the code! Yes, its nice to see highlighted code, but when it become useless, its better stick with old-fashion one [ code ] [/ code ].
    in your example the line
    PHP Code:
    $tokens preg_split('/\s+/'$input, -1PREG_SPLIT_NO_EMPTY); 
    should actualy be:
    PHP Code:
    $tokens preg_split('/\\\s+/'$input, -1PREG_SPLIT_NO_EMPTY); 
    NOTE: I had to use '/\\\s+/' so it would actualy showed in the reply.

  • #4
    Senior Coder
    Join Date
    Jun 2002
    Location
    frankfurt, german banana republic
    Posts
    1,848
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks V@no for proofreading, I should have known better. In the past I've repeatedly dealt with those disappearing backslashes. With regexes, it's most often better to mark up the text with [ code ][/ code ], instead of [ php ][/ php ]. It's a bug in the forum software.

    De gustibus non est disputandum.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •