PDA

View Full Version : Search engine evolution: avoid partial match


Alex Piotto
08-21-2002, 08:09 PM
Hi my friends!
The code below is a workaround of the code that murdred give to me... I modified it a bit, and it works, I mean it do the search... but it match partial words...
I need it match only whole words.

I tryed to put a \b .... \b in the preg_match function around the $searchpattern, but doesn't works...

here is the code:
________________________

<?
Header('Cache-Control: no-cache');
Header('Pragma: no-cache');
?>
<html><head><title>search the csv file</title></head><body>
<form method="post" action="<?php echo $HTTP_SERVER_VARS['PHP_SELF'];?>">
<input type="hidden" name="submitted" value="done">
Keywords: <input type="text" name="keyword">
<input type="submit" value="Search" name="submit"></form>
<?
$searchPattern = ''; // contains the searchPattern

function search_csv($file, $searchString){

if (empty($searchString)) {return false;} // check if the search string contains nothing

$keywords = preg_split('/[s]+/', trim($searchString)); // prepare search pattern

$keywords = array_values(array_unique($keywords)); // remove duplicate entries

for ($k = 0; $k < count($keywords); $k++) {
$keywords[$k] = preg_quote($keywords[$k], '/'); // escape preg specific characters
}

$searchPattern = '/(' . implode('|', $keywords) . ')/i'; // build the search case-insensitive pattern

$fp = fopen($file, "r"); /* go through the file */

while ( ($str = fread($fp, 1)) != "\n" ) {} // advance the file pointer to the next line

while ($fields = fgetcsv ($fp, 4096, ";")) { // iterate through each line and

// search all the fields
if (preg_match ($searchPattern, $fields[1] . ' ' . $fields[2]. ' ' . $fields[3]. ' ' . $fields[4]. ' ' . $fields[5])) {
$results[] = $fields;
}}
fclose ($fp);
return $results;
}

// call the function if form has been submitted
if (isset($HTTP_POST_VARS['submitted']) && $HTTP_POST_VARS['submitted'] = 'done') {
$results = search_csv("data2.txt", $HTTP_POST_VARS['keyword']);

echo "<div style='font-family:verdana; font-size:10px; color:#000000;'>\n";

if (!$results || count($results) == 0) { echo "<b>No results found.</b>\n";}
else { echo "<b>Found results:</b><br /><br />\n";

for ($i = 0; $i < count($results); $i++) {
$one = $results[$i][1];
$two = $results[$i][2];
$three = $results[$i][3];
$four = $results[$i][4];
$five = $results[$i][5];

echo "$one - $two - $three - $four - $five<br><br>";
}}}
?>
</div>
</body>
</html>

_________________________

Any suggestion?
Alex

Alex Piotto
08-22-2002, 06:41 AM
nobody?

.... murdred, are you there?

please...

Alex

mordred
08-22-2002, 03:26 PM
Hey, my nickname is mordred, not murdred... that looks like "murdered". ;)

Anyway, you were very close and on the right track. The following code is the my original searchCSV function with one line changed, to surround each keyword with the word boundary special chars.


// global variables
$searchPattern = ''; // contains the searchPattern

function search_csv($file, $searchString)
{
// check if the search string contains nothing
if (empty($searchString)) {
return false;
}

global $searchPattern;
$results = array();

/* Construct the search pattern for the preg_match */

// prepare search pattern
$keywords = preg_split('/[\s]+/', trim($searchString));

// remove duplicate entries
$keywords = array_values(array_unique($keywords));

// escape preg specific characters
// NOTE: This line has been changed to find only complete
// keywords and not partial matches
for ($k = 0; $k < count($keywords); $k++) {
$keywords[$k] = '\\b' . preg_quote($keywords[$k], '/') . '\\b';
}

// build the search pattern. Observe that the "i" modifier
// stands for a case-insensitive pattern.
$searchPattern = '/(' . implode('|', $keywords) . ')/i';


/* go through the .csv file */

$fp = fopen($file, "r");

// advance the file pointer to the next line
while ( ($str = fread($fp, 1)) != "\n" ) {
// empty statement
}

// iterate through each line and
while ($fields = fgetcsv ($fp, 1000, ";")) {

// search all desired fields
if (preg_match($searchPattern, $fields[1] . ' ' . $fields[2])) {
$results[] = $fields;
}
}

fclose ($fp);

return $results;
}

// call the function if form has been submitted
if (isset($_POST['submitted']) && $_POST['submitted'] = 'done') {
$results = search_csv("test.csv", $_POST['keyword']);

/* write the results */

echo "<div style='font-family:verdana; font-size:9pt; color:#000000;'>\n";

if (!$results || count($results) == 0) {
echo "<b>No results found.</b>\n";
} else {
echo "<b>Found results:</b><br />\n";

// iterate throught the results array
for ($i = 0; $i < count($results); $i++) {
echo preg_replace($searchPattern, "<span style='color:red;'>\\1</span>", $results[$i][1]) . ": ";
echo preg_replace($searchPattern, "<span style='color:red;'>\\1</span>", $results[$i][2]) . "<br />\n";
}
}
echo "</div>";
}


When I use the code above and type the keywords "sun conan" and test.csv contains


id;author;title;usdprice;
0;Agatha Cristie;Evil under the sun;50;
1;Arthur Conan Doyle;A study in Scarlet;75;
2;Raymond Chandler;Farewell my Lovely;28;
3;Stefano Benny;Hearth;35;
4;Stefano Benny;Black hole sun;123;


then the printed result is

Agatha Cristie: Evil under the sun
Arthur Conan Doyle: A study in Scarlet
Stefano Benny: Black hole sun

I think that's pretty close to your expectations. I commented on the line that needs amendment. Note that you have to put your on concatenated string for the preg_match function, as you did in your code example. I just left that "as is" so that my example works properly.

hth

P.S: Use the PHP tag of the vB Code, that helps reading your code.

Alex Piotto
08-22-2002, 08:09 PM
Hi mordred
...eh eh, sorry about your nickname... it was a typo error!

I can't beleive, but is not working! I copied the last code you send to me, and I used the same db you annexed... I changed the $_POST with HTTP_POST_VARS (because I am using php4.04) and... I never get the red names, just red squares! If I search for Stefano, I get all the db with a lot of red squares...

Let me show (like this I'll try out the vb option...)


<?
Header('Cache-Control: no-cache');
Header('Pragma: no-cache');
?>
<html><head><title>search the csv file</title></head><body>
<form method="post" action="<?php echo $HTTP_SERVER_VARS['PHP_SELF'];?>">
<input type="hidden" name="submitted" value="done">
Keywords: <input type="text" name="keyword">
<input type="submit" value="Search" name="submit"></form>
<?
// global variables
$searchPattern = ''; // contains the searchPattern

function search_csv($file, $searchString)
{
// check if the search string contains nothing
if (empty($searchString)) {
return false;
}

global $searchPattern;
$results = array();

/* Construct the search pattern for the preg_match */

// prepare search pattern
$keywords = preg_split('/[s]+/', trim($searchString));

// remove duplicate entries
$keywords = array_values(array_unique($keywords));

// escape preg specific characters
// NOTE: This line has been changed to find only complete
// keywords and not partial matches
for ($k = 0; $k < count($keywords); $k++) {
$keywords[$k] = '\b' . preg_quote($keywords[$k], '/') . '\b';
}

// build the search pattern. Observe that the "i" modifier
// stands for a case-insensitive pattern.
$searchPattern = '/(' . implode('|', $keywords) . ')/i';


/* go through the .csv file */

$fp = fopen($file, "r");

// advance the file pointer to the next line
while ( ($str = fread($fp, 1)) != "\n" ) {
// empty statement
}

// iterate through each line and
while ($fields = fgetcsv ($fp, 1000, ";")) {

// search all desired fields
if (preg_match($searchPattern, $fields[1] . ' ' . $fields[2])) {
$results[] = $fields;
}
}

fclose ($fp);

return $results;
}

// call the function if form has been submitted
if (isset($HTTP_POST_VARS['submitted']) && $HTTP_POST_VARS['submitted'] = 'done') {
$results = search_csv("databook.csv", $HTTP_POST_VARS['keyword']);

/* write the results */

echo "<div style='font-family:verdana; font-size:9pt; color:#000000;'>\n";

if (!$results || count($results) == 0) {
echo "<b>No results found.</b>\n";
} else {
echo "<b>Found results:</b><br />\n";

// iterate throught the results array
for ($i = 0; $i < count($results); $i++) {
echo preg_replace($searchPattern, "<span style='color:red;'>\1</span>", $results[$i][1]) . ": ";
echo preg_replace($searchPattern, "<span style='color:red;'>\1</span>", $results[$i][2]) . "<br />\n";
}
}
echo "</div>";
}
?>
</body>
</html>


I don't understand...

If I use this line $keywords[$k] = '\b' . preg_quote($keywords[$k], '/') . '\b';
in the code I posted on the first message of this thread... it just don't works...
Alex :confused:

mordred
08-23-2002, 09:22 AM
*slams head against monitor*

arrggghghg, this forum drives me mad with these backslashes not appearing. And you can call me a fool for advising you to use this vB Code.

I think the confusion comes from two lines that missed some backslashes (not your fault, it's this forum).

a):

$keywords = preg_split('/[\s]+/', trim($searchString));


should rather be
$keywords = preg_split('/[\s]+/', trim($searchString));


b):

echo preg_replace($searchPattern, "<span style='color:red;'>\\1</span>", $results[$i][1]) . ": ";
echo preg_replace($searchPattern, "<span style='color:red;'>\\1</span>", $results[$i][2]) . "<br />\n";


should read

echo preg_replace($searchPattern, "<span style='color:red;'>\\1</span>", $results[$i][1]) . ": ";
echo preg_replace($searchPattern, "<span style='color:red;'>\\1</span>", $results[$i][2]) . "<br />\n";

See the differences? Those missing backslashes messed up the regular expression syntax and produced those weird results. I've been very surprised to see those red squares populating my screen, I first thought that the regexp library was broken.

Try to change the lines mentioned above and if you still don't get useful results, post again.

Alex Piotto
08-23-2002, 09:45 AM
You are a real code-detective, man!
We should communicate the problem of backslash evaporation to the administrator...
I'll put myself to work right now and I'll let you know. Thanks a lot for your patience.
Alex :)

mordred
08-23-2002, 09:53 AM
Originally posted by Alex Piotto
We should communicate the problem of backslash evaporation to the administrator...


I've done that already. Seems that not only backslashes are affected.
http://www.codingforums.com/showthread.php?s=&threadid=4633

Alex Piotto
08-23-2002, 05:00 PM
Hi!
Sorry I am late... today was a really busy day .

Anyway, after the correction the scrit works well... congratulation mordred!

And Very Very thanks for the help.

ciao

Alex :)