...

View Full Version : Advice Needed: Best Way To Compare Strings



Bobafart
08-02-2007, 12:19 AM
I am trying to compare strings.

I have a news item with a Headline and a ~20 word description of the news item.

I pull key words out of the headline and the description and put them in an array.

I then compare the list of key words to a large database of other news items.

I then post the related news items under the initial news article as "Related News Item" links.

---

What I want to do is have an AJAX slider that slides the relevance to increase/decrease the signal to noise ratio.

So if you slide the bar to the right (I already have the slider made .. I just need a way to code the server side PHP) only a few but more accurate related news items are posted, if you slide the bar all the way to the left then many, many related news items are posted but they aren't as accurate.

So I need a way to compare strings with accuracy in mind.


The only way I can think of doing this is:



$compareStrings = similar_text($headlineKeywords, $relatedHeadline, $percentageThreshold );


where $percentageThreshold changes based on the slider value.

Does anyone else know of another way to do this?

mwookie
08-02-2007, 11:57 PM
A very interesting concept. I think you have the right idea. I have done things in the past using a threshold concept. I tried to find "What percentage of the words are the same"

In addition you should look at weighting words so that Iraq carries more weight than news/story/the/etc.

Finally I would stem the words (http://snowball.tartarus.org/) so that you can pickup related stories that spelling things a little differently.

I am interested also in others response as this relates very well to my image search engine (http://www.imagetrail.net).


_________________________
"Insanity is hereditary - you get it from your children." Sam Levenson
Web Development Company (http://www.emblemsoftware.com) Projects (Stock Photo Search Engine (http://www.imagetrail.net) Learn how to sell your photos (http://www.microstockforum.com/forums))

Bobafart
08-03-2007, 08:47 PM
[QUOTE=mwookie;595925]look at weighting words so that Iraq carries more weight than news/story/the/etc.[/url].

How does one "weight" words ... is there a PHP library that does this ?

mwookie
08-03-2007, 08:52 PM
[QUOTE=mwookie;595925]look at weighting words so that Iraq carries more weight than news/story/the/etc.[/url].

How does one "weight" words ... is there a PHP library that does this ?


Not that I know of. I weight things based on two things:

1) Based on their popularity (words that show up in most headlines don't mean anything)
2) A table that assigns weights. This is much harder because you have to look at al the possible words, but I believe its more accuate. If a word is not in the list, just leave it at a default weight.

Hope this helps


_________________________
"Insanity is hereditary - you get it from your children." Sam Levenson
Web Development Company (http://www.emblemsoftware.com) Projects (Stock Photo Search Engine (http://www.imagetrail.net) Learn how to sell your photos (http://www.microstockforum.com/forums))



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum