...

View Full Version : How do I finda 3-word-group in a text string?



jeddi
11-04-2009, 09:34 AM
I am writing a little script that will improve authors writing skills by
finding repeated phrases in the text.

The text of a chapter will average about 10,000 words, however, I could
reduce the size of the files if it is better to do so.

So the idea is to search through a string and find repeats of any 3 or 4 word group.

So if the author has repeated the phrase "then I went" 6 times in the text,
then this would be found and highlighted.

I am not sure where to start with this :o

Maybe it is best to start by converting the string into an array of all the words?


$word_list = explode(" ", $text);

But I still don't know how the best way to find these repeated 3 or 4 word phrases is.

The other thing I want to provide is a list of all the words used ( maybe I will exclude
words like and, the, a, etc) and the number of times they are used.

Any good ideas on how I should proceed ?

Thanks

Phil Jackson
11-04-2009, 11:06 AM
if(preg_match_all("#\s[a-z]{3,4}\s#is", $fileContents, $matches))
{
foreach($matches[0] as $word)
{
echo $word."<br />";
}
}


i think

kbluhm
11-04-2009, 11:11 AM
No, that will find all alphabetic words between 3 & 4 characters in length and surrounded by white space.

Phil Jackson
11-04-2009, 11:20 AM
No, that will find all alphabetic words between 3 & 4 characters in length and surrounded by white space.

My fault, didn't read the question correctly... will work on it some more



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum