PDA

View Full Version : remove wrong strings from textarea


mudoeb
02-20-2008, 01:42 PM
i have a textarea and strings in it, separated by return key (every string on the new line)

i want to remove strings which are not similar (30% or less) to most of other strings
case of the words and letters is not important, same about different signs

for example

nice white mac
mac is nice and white
nice mac is white
grey pc is pretty good
white mac is nice
grey pc is pretty good must be removed

A1ien51
02-20-2008, 06:25 PM
Well you should look into split on \n and \s

You should loop through that.

Put the values into an array and increment number of times they are there.

Calculate which ones are bad

and use replace to get rid of them.

Show us some code and where you get stuck. You will not learn anything by not coding it.

Eric

mudoeb
02-21-2008, 04:31 AM
Show us some code and where you get stuck
i don't have code because i've started to learn javascript couple of day ago

but i need such script very much, so help plz if you can

rnd me
02-21-2008, 01:27 PM
how do you define "strings which are not similar (30% or less) to most of other strings" ?

the string replacement code is easy, but how do you propose to test similarity?

i guess one way would be to gather up all the words, and make sure that each word of each line appear twice in the collection?

i would like to help, but i am not sure i understand the specifics...

mudoeb
02-21-2008, 03:18 PM
here is my algo:
- splice all words into 1D array
- find 5 most popular words
- analyze every string from the text area, if it contains 2 or less from this 5 words > remove
- make new array

ps - i think i'll try on php, because javascript will not be very fast, especially without enough cpu resources

rnd me
02-21-2008, 03:43 PM
assuming you get the textarea value to variable textValue:

- splice all words into 1D array

var tr=textValue.toLowerCase().split(/\W/g)



- find 5 most popular words



var top5=tr.counts().map().sort(function(a , b){ return a[1] - b[1] }).slice(-5)


- analyze every string from the text area, if it contains 2 or less from this 5 words > remove
- make new array


var lines= textValue.split(/\n/g);

var goodLines=lines.filter(function(a){
var tl=a.split(/\W/g);
var hit=0;
for(var i=o; i<tl.length;i++){
if( top5[tl[i]] ){ hit++ }
}
return hit > 1
})


alert (goodLines.join("\n"))


required code:

Array.prototype.counts=(function () {len = this.length;var Ray = {};this.sort();var i = this[0];for (z = 0; z < len; z++) {if (this[z] != this[z - 1]) {i = this[z];Ray[i] = 1;} else {Ray[i]++;}}return Ray;})
function obMap(ob) {var r = [];var i = 0;for (var z in ob) {if (ob.hasOwnProperty(z)) {r[i++] = [z, ob[z]];}}return r;}
Object.prototype.map =function () { return obMap(this); }



i don't see how php could do it faster than js, they are both scripts...

firefox protos used, use a unification lib for ie compat.

mudoeb
02-21-2008, 09:11 PM
i have mistakes in IE7

rnd me
02-22-2008, 09:30 AM
i should have tested better, it was getting late.

this works in ff+ie7 for me:





<script>

textValue="nice white mac\n\
mac is nice and white\n\
nice mac is white\n\
grey pc is pretty good\n\
white mac is nice"


Array.prototype.counts=(function () {len = this.length;var Ray = {};this.sort();var i = this[0];for (z = 0; z < len; z++) {if (this[z] != this[z - 1]) {i = this[z];Ray[i] = 1;} else {Ray[i]++;}}return Ray;})
function obMap(ob) {var r = [];var i = 0;for (var z in ob) {if (ob.hasOwnProperty(z)) {r[i++] = [z, ob[z]];}}return r;}
Object.prototype.map =function () { return obMap(this); }

if (!Array.prototype.map) {// from http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Objects:Array:map
Array.prototype.map = function (fun) {var len = this.length;if (typeof fun != "function") {throw new TypeError;}var res = new Array(len);var thisp = arguments[1];for (var i = 0; i < len; i++) {if (i in this) {res[i] = fun.call(thisp, this[i], i, this);}}return res;};}

if (!Array.prototype.filter) { //from http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Objects:Array:filter
Array.prototype.filter = function (fun) {var len = this.length;if (typeof fun != "function") {throw new TypeError;}var res = new Array;var thisp = arguments[1];for (var i = 0; i < len; i++) {if (i in this) {var val = this[i];if (fun.call(thisp, val, i, this)) {res.push(val);}}}return res;};}


var lc=textValue.toLowerCase()
var tr=lc.split(/\W/g)
var lines= lc.split(/\n/g);
var top5=tr.counts().map().sort(function(a , b){ return a[1] - b[1] }).slice(-5).map(function(a){return a[0]}).join(" ")

var goodLines=lines.filter(function(a){
var tl=a.split(/\W/g);
var hit=0;
for(var i=0; i<tl.length;i++){
if( top5.indexOf(tl[i]) > -1 ){ hit++ ; }
}
return hit > 2
})

alert (goodLines.join("\n"))

</script>


cheers

mudoeb
02-22-2008, 10:46 AM
tnx a lot

mudoeb
03-16-2008, 05:07 PM
why this script returns mistake in the string

var top5=tr.counts().map().sort(function(a , b){ return a[1] - b[1] }).slice(-5).map(function(a){return a[0]}).join(" ")

if, for example, i try to use this strings

007 from russia with love
james bond from russia with love
jobs in russia
kazan russia
leader of russia
life in russia
little russia
map of russia
maps of russia
miss russia
miss russia 2006
moscow russia
mother russia
mp3 russia