View Full Version : remove wrong strings from textarea
mudoeb
02-20-2008, 01:42 PM
i have a textarea and strings in it, separated by return key (every string on the new line)
i want to remove strings which are not similar (30% or less) to most of other strings
case of the words and letters is not important, same about different signs
for example
nice white mac
mac is nice and white
nice mac is white
grey pc is pretty good
white mac is nice
grey pc is pretty good must be removed
A1ien51
02-20-2008, 06:25 PM
Well you should look into split on \n and \s
You should loop through that.
Put the values into an array and increment number of times they are there.
Calculate which ones are bad
and use replace to get rid of them.
Show us some code and where you get stuck. You will not learn anything by not coding it.
Eric
mudoeb
02-21-2008, 04:31 AM
Show us some code and where you get stuck
i don't have code because i've started to learn javascript couple of day ago
but i need such script very much, so help plz if you can
rnd me
02-21-2008, 01:27 PM
how do you define "strings which are not similar (30% or less) to most of other strings" ?
the string replacement code is easy, but how do you propose to test similarity?
i guess one way would be to gather up all the words, and make sure that each word of each line appear twice in the collection?
i would like to help, but i am not sure i understand the specifics...
mudoeb
02-21-2008, 03:18 PM
here is my algo:
- splice all words into 1D array
- find 5 most popular words
- analyze every string from the text area, if it contains 2 or less from this 5 words > remove
- make new array
ps - i think i'll try on php, because javascript will not be very fast, especially without enough cpu resources
rnd me
02-21-2008, 03:43 PM
assuming you get the textarea value to variable textValue:
- splice all words into 1D array
var tr=textValue.toLowerCase().split(/\W/g)
- find 5 most popular words
var top5=tr.counts().map().sort(function(a , b){ return a[1] - b[1] }).slice(-5)
- analyze every string from the text area, if it contains 2 or less from this 5 words > remove
- make new array
var lines= textValue.split(/\n/g);
var goodLines=lines.filter(function(a){
var tl=a.split(/\W/g);
var hit=0;
for(var i=o; i<tl.length;i++){
if( top5[tl[i]] ){ hit++ }
}
return hit > 1
})
alert (goodLines.join("\n"))
required code:
Array.prototype.counts=(function () {len = this.length;var Ray = {};this.sort();var i = this[0];for (z = 0; z < len; z++) {if (this[z] != this[z - 1]) {i = this[z];Ray[i] = 1;} else {Ray[i]++;}}return Ray;})
function obMap(ob) {var r = [];var i = 0;for (var z in ob) {if (ob.hasOwnProperty(z)) {r[i++] = [z, ob[z]];}}return r;}
Object.prototype.map =function () { return obMap(this); }
i don't see how php could do it faster than js, they are both scripts...
firefox protos used, use a unification lib for ie compat.
mudoeb
02-21-2008, 09:11 PM
i have mistakes in IE7
rnd me
02-22-2008, 09:30 AM
i should have tested better, it was getting late.
this works in ff+ie7 for me:
<script>
textValue="nice white mac\n\
mac is nice and white\n\
nice mac is white\n\
grey pc is pretty good\n\
white mac is nice"
Array.prototype.counts=(function () {len = this.length;var Ray = {};this.sort();var i = this[0];for (z = 0; z < len; z++) {if (this[z] != this[z - 1]) {i = this[z];Ray[i] = 1;} else {Ray[i]++;}}return Ray;})
function obMap(ob) {var r = [];var i = 0;for (var z in ob) {if (ob.hasOwnProperty(z)) {r[i++] = [z, ob[z]];}}return r;}
Object.prototype.map =function () { return obMap(this); }
if (!Array.prototype.map) {// from http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Objects:Array:map
Array.prototype.map = function (fun) {var len = this.length;if (typeof fun != "function") {throw new TypeError;}var res = new Array(len);var thisp = arguments[1];for (var i = 0; i < len; i++) {if (i in this) {res[i] = fun.call(thisp, this[i], i, this);}}return res;};}
if (!Array.prototype.filter) { //from http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Objects:Array:filter
Array.prototype.filter = function (fun) {var len = this.length;if (typeof fun != "function") {throw new TypeError;}var res = new Array;var thisp = arguments[1];for (var i = 0; i < len; i++) {if (i in this) {var val = this[i];if (fun.call(thisp, val, i, this)) {res.push(val);}}}return res;};}
var lc=textValue.toLowerCase()
var tr=lc.split(/\W/g)
var lines= lc.split(/\n/g);
var top5=tr.counts().map().sort(function(a , b){ return a[1] - b[1] }).slice(-5).map(function(a){return a[0]}).join(" ")
var goodLines=lines.filter(function(a){
var tl=a.split(/\W/g);
var hit=0;
for(var i=0; i<tl.length;i++){
if( top5.indexOf(tl[i]) > -1 ){ hit++ ; }
}
return hit > 2
})
alert (goodLines.join("\n"))
</script>
cheers
mudoeb
02-22-2008, 10:46 AM
tnx a lot
mudoeb
03-16-2008, 05:07 PM
why this script returns mistake in the string
var top5=tr.counts().map().sort(function(a , b){ return a[1] - b[1] }).slice(-5).map(function(a){return a[0]}).join(" ")
if, for example, i try to use this strings
007 from russia with love
james bond from russia with love
jobs in russia
kazan russia
leader of russia
life in russia
little russia
map of russia
maps of russia
miss russia
miss russia 2006
moscow russia
mother russia
mp3 russia
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.