PDA

View Full Version : How do I finda 3-word-group in a text string?



jeddi
11-04-2009, 09:34 AM
I am writing a little script that will improve authors writing skills by
finding repeated phrases in the text.

The text of a chapter will average about 10,000 words, however, I could
reduce the size of the files if it is better to do so.

So the idea is to search through a string and find repeats of any 3 or 4 word group.

So if the author has repeated the phrase "then I went" 6 times in the text,
then this would be found and highlighted.

I am not sure where to start with this :o

Maybe it is best to start by converting the string into an array of all the words?


$word_list = explode(" ", $text);

But I still don't know how the best way to find these repeated 3 or 4 word phrases is.

The other thing I want to provide is a list of all the words used ( maybe I will exclude
words like and, the, a, etc) and the number of times they are used.

Any good ideas on how I should proceed ?

Thanks

Phil Jackson
11-04-2009, 11:06 AM
if(preg_match_all("#\s[a-z]{3,4}\s#is", $fileContents, $matches))
{
foreach($matches[0] as $word)
{
echo $word."<br />";
}
}


i think

kbluhm
11-04-2009, 11:11 AM
No, that will find all alphabetic words between 3 & 4 characters in length and surrounded by white space.

Phil Jackson
11-04-2009, 11:20 AM
No, that will find all alphabetic words between 3 & 4 characters in length and surrounded by white space.

My fault, didn't read the question correctly... will work on it some more