When publishing a new article to our website we want to ensure it is unique.
- unique to our own website at least.
Of course there always similar articles around so there are softwares that calculate the uniqueness of your article.
But how do they usually calculate it - and in particular, how does Google calculate it ?
Here is one approach that I came up with based on what I have read.
1) From the new article, take the first four-word-group ( this will be the 1st, 2nd, 3rd and 4th words ) and see if that occurs in the comparison article.
If it is found in the second article record it as one hit.
2) Then take the next four-word-group ( this will be the 2nd, 3rd, 4th and 5th words )and see if that occurs in the comparison article.
If it is found in the second article record it as another hit.
3) Continue all through the article.
In a 1000 word article there will be 997 four-word-group s.
If there are 120 hits, then the similarity percentage would be 120/997 * 100 = 12 %
Or the Article Uniqueness = 88 %
Is this similar to Google's calculation ?
Or does Google just compare every word in the article ?
Maybe a three-word-group would be better ?
Any thoughts ?