How to count frequencies of a word appearing in a paragraph?

08-07-2007, 03:00 AM
Say I have a block of text and I want to count "important" words in the block of text -- what's the best way to do this?

"important" means operative words (excluding "the" "a" "because" "for" "this" "that").

I want to count each "important" word and then record the number of times each word appears in the block of text.

Has anyone does this before?

ralph l mayo
08-07-2007, 03:24 AM
The exceptions you list are part of a group commonly called "stop words," you can get a list here (http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words). If you're doing this purely in PHP it's going to be slow regardless of how you implement it, but you might have luck splitting the target text into an array of words and loading the stop words as another array, and using array_diff to remove the stop words. From there you're an array_count_values away from the answer.