View Full Version : Intersection of sets (JS arrays)
Alex Vincent
01-20-2005, 10:46 PM
My boss asked me about a project, and this code spun out of my first attempt.
Basically, if a value is a member of all the arrays passed to the function, this function will add that value to a response array. The function returns the response array.
/**
* Return an array of members existing in all argument arrays.
*
* @param aSet(n) JSArray.
*/
function getSetIntersection(aSet1, aSet2) {
var temp = [];
var offset = arguments.length - 1;
var response = [];
for (var i = 0; i < arguments.length; i++) {
temp = temp.concat(arguments[i]);
}
temp.sort();
/*
* The goal is to find an ordered sequence of n identical members of temp,
* where n = arguments.length. For this, we just need to check the start
* and end points of the sequence, and if they match, push them on to the
* response array.
*/
for (var i = 0; i < temp.length - offset; i++) {
if (temp[i] == temp[i + offset]) {
response.push(temp[i]);
i += offset;
}
}
return response;
}
The assumption is that no member of the original arrays is repeated. I'd welcome a sanity check for that.
A recursive solution:
Array.prototype.contains = function(datum) {
if (this.length == 0) return false;
return (datum == this[0]) || this.slice(1).contains(datum);
}
Array.prototype.foldLeft = function(func, cumal) {
if (this.length == 0)
return cumal;
return this.slice(1).foldLeft(func, func(this[0], cumal));
}
function getSetIntersection(sets) {
return sets.slice(1).foldLeft(function (s1, s2) {
if (s1.length == 0 || s2.length == 0)
return [];
if (s1.contains(s2[0]))
return [s2[0]].concat(arguments.callee(s1, s2.slice(1)));
else
return arguments.callee(s1, s2.slice(1));
}, sets[0]);
}
Using list-processing techniques on Javascript arrays (cdr -> slice(1)) is certain to incur some cost, and but using contains() instead of sorting will at least allow some early outs in calculating intersections.
Alex Vincent
01-21-2005, 07:25 AM
I tried Jason's code (three variations of it, all provided by him), and each time, my code ran roughly about twice as fast.
Because Alex wants speed and not elegance:
See his reply for code
Modifying arrays in place, icky. Alex has some nice stats to show the zip of it.
Alex Vincent
01-21-2005, 08:03 AM
Fortunately, I can concede defeat just as easily. Jason revised his code again, and trounced me:
function intersect(s1, s2, s1break) {
var l1 = s1break, l2 = s2.length, offset = 0;
for (var i = 0; i < l1; i++) {
for (var j = 0; j < l2; j++) {
if (s2[j] == s1[i]) {
s1[offset++] = s1[i];
break;
}
}
}
return offset;
}
function getSetIntersection() {
var offset = arguments[0].length, c = arguments[0].slice(0, offset);
for (var i = 1, l = arguments.length; i < l; i++)
offset = intersect(c, arguments[i], offset);
return c.slice(0, offset);
}
And the proof?
pass1: 1.686, pass2: 0.385, diff: 1.301, faster: 77.16%
pass1: 1.703, pass2: 0.445, diff: 1.258, faster: 73.87%
pass1: 1.807, pass2: 0.802, diff: 1.005, faster: 55.62%
pass1: 2.317, pass2: 0.693, diff: 1.624, faster: 70.09%
pass1: 1.406, pass2: 0.561, diff: 0.845, faster: 60.1%
pass1: 1.971, pass2: 0.429, diff: 1.542, faster: 78.23%
pass1: 1.527, pass2: 0.544, diff: 0.983, faster: 64.37%
pass1: 1.444, pass2: 0.401, diff: 1.043, faster: 72.23%
pass1: 1.988, pass2: 0.566, diff: 1.422, faster: 71.53%
pass1: 1.45, pass2: 0.396, diff: 1.054, faster: 72.69%
codegoboom
01-22-2005, 03:56 PM
Interesting; please describe a context in which this may be useful, along with an example detailing the contents of two input arrays, and the resulting output expected.
It's a computer science exercise, really. Given n sets, find out which elements are common to all of them. It's something you do in the first week of any decent computer science course, and really not that interesting. There are many different ways you can do it, however, depending on if the sets are represented by sorted arrays, or linked lists, etc etc.
codegoboom
01-22-2005, 06:56 PM
Alright, I found a context (Two routes to intersection in JavaScript (http://www.developersdex.com/gurus/articles/276.asp)); thanks anyway.
Alex Vincent
01-23-2005, 08:02 AM
In my particular case, the company I work for needs an algorithm like this for one of its applications.
jkd, if I am to use your code, I'll need a written release of ownership from you, maybe some other things. avincent (no j, I didn't pick it) at cenzic dot com.
I found a context - Two routes to intersection in JavaScript (http://www.developersdex.com/gurus/articles/276.asp)
yeah, i was wondering, too. cheers for that :)
HiWorld
04-29-2008, 02:58 PM
Hi,
here is an application: I want to compute similarity between two sparse array.
for example, the cosine similarity will use the dot product:
sum(a[i]*b[i]) = a1 * b1 + a2 * b2 + ....
if either a[i] or b[i] are 0, the product is zero, so I can forget it. I already stocked only the non zero values of a and b, now I need to know the intersection of the non zero values of these two vectors, in an optimized way (binary search).
If anyone has a reference for that, please let me know!
Thanks a lot.
Alex Vincent
04-30-2008, 12:10 AM
Hm, I'd forgotten about this thread. Some of the new JS1.6+ stuff in Firefox might make this leaner and faster yet.
oesxyl
04-30-2008, 12:29 AM
Hm, I'd forgotten about this thread. Some of the new JS1.6+ stuff in Firefox might make this leaner and faster yet.
how this code look now? did you solve the issue with repeted member in array?
I'm interested in a package of containers, set, hash, bag, :)
regards
fside
04-30-2008, 08:43 AM
I think the setIntersection call to another routine has been known for some time. It is quick for an unordered bunch of lists. But there are other ways to come close. A lot of different ways to achieve the same thing in programming.
But it can only go so quickly. You can try to use the shortest list as the standard. But the better way is to take time to sort. That, too, could be problematic if a long list sort time actually become significant against the slower unordered algorithms. But if it's not 'too' bad, then sorting provides the opportunity to keep splitting the difference in your comparisons, the 'binary' search (never mind how the sort was accomplished). And you can apply the binary search to the standard, to the shortest list, as well. So you keep splitting the difference there, and weeding out those 'not possibles', and the same for the target/match. It's just a lot fewer match you have to perform. I would think that would be the fastest way, all else being . . etc. I'm sure it's been suggested, elsewhere. I've never really needed such code, myself. If I want to join some lists, there are SQL commands for that.
I suspect the difference, though, would be seen pretty quickly. But I don't know where it would start separating from the unordered approach. The unordered approach, too, can be much more dependent on the particular records, and where the matches are found. Ironically, on ordered lists, the previous approach might be among the best of the unordered algorithms. But if the entries are a certain way, other close/similar approaches might go as fast or faster.
Trinithis
04-30-2008, 09:06 AM
@oesxyl
Both solutions take advantage of built-in firefox methods.
Note: I just wrote these on the elegance level, not the speed level. (Though they might be fast.)
Does not eliminate repeated elements:
// Needs JavaScript 1.6
function intersect(xs, ys) {
return xs.filter(function(x) {
return ys.some(function(y) {
return x === y;
});
});
}
Eliminates repeated elements:
// Needs JavaScript 1.8
function intersect(xs, ys) {
return xs.filter(function(x) {
return ys.some(function(y) {
return x === y;
});
}).nub();
}
Array.prototype.nub = function() {
return this.reduce(
function(zs, x) {
return zs.contains(x)
? zs
: zs.concat([x]);
}, []
);
};
Array.prototype.contains = function(x) {
return this.indexOf(x) != -1;
};
fside
05-27-2008, 02:27 AM
how this code look now? did you solve the issue with repeted member in array?
As the problem is stated, repetition is only an issue with the array used as the standard for comparison - the shortest array. An element must be found in all arrays, or it doesn't count. The obvious trick used isn't binary search but that Javascript allows the comparison operators to essentially sort strings, and determine which is 'lesser' and 'greater'.
I mentioned using sort and a straightforward binary search, previously. I thought I'd put a simple version together just to show that the method can far outweigh any tweaks done to interpreted code.
This version is 400-500% faster than the one posted by Vincent (which is found various place on the web).
It would still be nothing compared to a built-in function. The new getElementsByClassName which is coming (FF3) runs rings around any interpreted workaround, which everyone uses. Even the best of these was clocked as 1800% slower than the built-in. If you want fast, you have to compile and link it into the browser as the viewer clicks in (maybe in future?).
I also included some test arrays as an attachment. You can see they are from a census listing. You could grab more from that page, there are thousands. And I wouldn't want to say that this is as fast as it can go (fwiw, the .sort() at the start of the function doesn't seem to affect the running time at all). Because of the method used, one is not penalized by adding more statements, more code. But I think that there are some unnecessary comparisons made in this routine.
function matchAllFromSorted(){
var arRtn=[ ], idxStnd=0, testLen=100000, i, j, stnd, prevstnd="", upperB, lowerB, midPt;
/* Find the shortest sorted array, then use it as the standard for comparison. */
for (i=0; i <arguments.length; i +=1){
arguments[i].sort();
if (arguments[i].length <testLen){
testLen = arguments[i].length;
idxStnd = i;
}
}
for (i=0; i <arguments[idxStnd].length; i +=1){
stnd = arguments[idxStnd][i];
/* Crude test for duplicates - because all arrays are sorted. */
if (stnd !==prevstnd){
for (j=0; j <arguments.length; j +=1){
if (j===stnd){ j +=1; }
/* upper bound, lower bound, mid point. Couldn't be simpler. */
lowerB = 0;
upperB = arguments[j].length;
/* could add counter if this bothers you, while(p<20) or something? */
while(1){
midPt = Math.floor( ((upperB -lowerB)/2) +lowerB);
if ( lowerB===midPt ){ j=1000; break; }
comp =arguments[j][midPt];
if (stnd <comp){ upperB = midPt; }
if (stnd >comp){ lowerB = midPt; }
if (stnd ===comp){
if (j===arguments.length-1){ arRtn.push(stnd); }
break;
}
}
}
}
prevstnd =stnd;
}
return arRtn;
}
(function testOther(){
var gt1, gt2, gt3=0;
for (i=0; i<10; i+=1){
/* see attached file */
copythese();
gt1 = new Date().getTime();
/* binary search on sorted arrays */
rtn = matchAllFromSorted(a1,a2,a3,a4);
/* orig version */
/* rtn = getSetIntersection(a1,a2,a3,a4); */
gt2 = new Date().getTime();
gt3 += gt2 -gt1;
}
alert(rtn +":" +(gt3));
})();
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.