Here are two scripts which do the same thing, that is count the number of words.
Code:
<script type = "text/javascript">
var text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam ipsum leo, scelerisque at dapibus ac, consectetur vel ipsum. Morbi et metus ut diam molestie ullamcorper. Suspendisse rutrum semper semper. Donec volutpat neque in lorem tempus scelerisque. Curabitur dignissim rhoncus quam ac suscipit. Donec viverra quam lobortis neque porta a sagittis urna tristique.";
var countWords = function(which) {
var numw = which.match(/\S+/g).length;
alert ("The number of words is:- " + numw);
}
countWords(text);
</script>
<script type = "text/javascript">
var text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam ipsum leo, scelerisque at dapibus ac, consectetur vel ipsum. Morbi et metus ut diam molestie ullamcorper. Suspendisse rutrum semper semper. Donec volutpat neque in lorem tempus scelerisque. Curabitur dignissim rhoncus quam ac suscipit. Donec viverra quam lobortis neque porta a sagittis urna tristique.";
String.prototype.countWords = function() {return this.match(/\S+/g).length;};
alert ("The number of words is:- " + text.countWords());
</script>
I would appreciate it if someone would explain to me which of these two is to be preferred, or to put it another way, what is the advantage of using the prototype? Please don't say it is a couple of milliseconds faster!
In theory, there isn't any difference between the theory and practice. In practice, there is.
__________________
All the code given in this post has been tested and is intended to address the question asked.
Unless stated otherwise it is not just a demonstration.
I think you may be getting into programming philosophy here, Philip.
My *personal* take on this: If you are going to be using countWords( ) a lot, in several different pages, then use the String.prototype and put it into a ".js" library that you load in each page.
If it's something you will use in only one page, especially only in one or two places, then (quite frankly) who cares?
Clearly, the prototype is the more OO approach, and from some (again) philosophical viewpoint it is more "correct". But OO can be taken to extremes, in my opinion. There is little reason to worry about it for "one offs".
But be prepared for a firestorm that will disagree with my ad hoc viewpoint.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
If you're gonna use it a lot, I like the prototype approach because I like it more than having the function float around in the middle of nowhere. I'd just like to have some kind of hierarchy telling me that this is a function for strings.
If you won't use it that often, maybe a compromise solution would be putting it into a StringUtils class.
Also, from a point of view that does not consider time performance, I wouldn't like creating a big array just to measure its length (but not actually use it) if I call that function on big strings. Not that I'd have a better alternative right now.
There are some people who argue against adding methods to the built in objects because theoretically that can break expected functionality. Those people would argue that you should create a new object that inherits from the built in one, add your new methods to that and then use that as the base for your objects in place of the built in one.
Doing so would certainly complicate things considerably with this particular situation.
With the new version of JavaScript giving greater control of whether methods are enumerable or not I would expect any theoretical problems could be resolved.
In practice I have never come across a situation where adding methods has had any issues.
Airblader: I dunno if it's better or not, but one sneaky approach:
Code:
function countWords( txt )
{
return txt.replace(/\S+/g,"*").replace(/\s+/g,"").length;
}
Depends on how replace does it's work, internally. One implementation I saw did a split and then a join, so you'd still end up with an array for a moment. But if it were to work incrementally...
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
As I said, for short strings this shouldn't be an issue. For longer strings, it can still count thousands of times per second, but nevertheless, even a quick'n'dirty implementation avoiding the big array turns out to be double as fast ( jsperf-Test ).
It's the same old thing: It's faster, but in either case it's still fast enough for virtually any normal application.
I agree with my previous posters. if you need it once or twice, the stand-alone function is OK, but I would prefer the String prototype, because 1) counting words only makes sense on strings (and not on numbers or arrays or any other data type you pass to the function) and 2) you keep the global namespace clean.
__________________
please post your code wrapped in [CODE] [/CODE] tags
I agree with my previous posters. if you need it once or twice, the stand-alone function is OK, but I would prefer the String prototype, because 1) counting words only makes sense on strings (and not on numbers or arrays or any other data type you pass to the function) and 2) you keep the global namespace clean.
1. i can see the need to count words in an array. minor point i know.
2. protos ARE globals, so there is still pollution. I like to think of protos as "to the right of the dot globals", if that makes sense. regular globals go "to the left of the dot".
.countWords probably won't clash, but simply .count might.
one suggestion is that you don't recycle names, even between types.
you might think it would be nice to have new Array().match(), but you never know when someone will duck-type an argument by asking x.match ? [x] : x ; to ensure an array. if your array quacks like a string, it can be easily mistaken.
lastly, i would say that if you are only going to use it a couple times, why even bother with the function?
".split(/\W+/g).length" isn't that much to type a few times...
lastly lastly, one should use split() instead of match() because match() can return null, which doesn't have a length, which means it will throw and halt your app.
__________________ my site (updated 5/13) STATS (2013/5) HTML5:90.2% MOB:14% IE7:0.5% IE8:8.6% IE9:9.8% IE10:10%
1. i can see the need to count words in an array. minor point i know.
But it is a different use-case and calls for a different implementation (actually, I don't even think it's intuitively obvious what this function should return on an array*). So I wouldn't see anything wrong with having StringUtils.countWords() and ArrayUtils.countWords() as separated methods.
Optionally read the above paragraph with String.prototype.countWords() and Array.prototype.countWords().
The only method where actually "overloading" would be okay to me is the global function method
Code:
function countWords( data ) {
if( typeof data === 'string' ) {
return countWordsInString( data );
// note: there are better ways to check for an array; this is just an example
} else if( typeof data === 'object' && data instanceof Array ) {
return countWordsInArray( data );
} else {
throw new TypeError( 'Parameter is not a string or array.' );
return null;
}
}
because it isn't attached to a certain type. But that's the one method I don't like anyway. But that's just the OOP-thinking speaking.
*) By that I mean: Should it return the number of words in all array elements added up? Or an array of the number of words in each element? If the first one, does it concatenate the array before counting or count each element separately and add up (different results!)?
Quote:
i would say that if you are only going to use it a couple times, why even bother with the function?
That is by all means bad advice in my opinion. If ever, the only appropriate situation to do this would be if you use it exactly once and not more. But what if you decide to use it more later on? What if suddenly you want to change the implementation (i.e. move from match to split)? You're setting yourself up for a big refactoring session that could've been as easy as changing one method.
Last edited by Airblader; 02-16-2013 at 09:38 AM..
lastly lastly, one should use split() instead of match() because match() can return null, which doesn't have a length, which means it will throw and halt your app.
Good point!
I would have thought that in the real world the only likely use for this is to count the number of words entered by the user into a textarea.
In which case rnd me's suggestion seems fine to me.
But as we so often say, there are several different ways to skin a cat.
Note: The two methods can give different results:-
Code:
var countWords = function(which) {
var numw = which.split(/\W+/g).length; // \W+ splits at an apostrophe as well - "We're" counts as two
//var numw = which.match(/\S+/g).length; // \S+ matches non-spaces
alert ("The number of words is:- " + numw);
}
The way to overcome the null issue with match is
Code:
var countWords = function(which) {
var numw = which.match(/\S+/g);
if (numw != null) {numw = numw.length;}
else {numw = 0;}
alert ("The number of words is " + numw);
}
__________________
All the code given in this post has been tested and is intended to address the question asked.
Unless stated otherwise it is not just a demonstration.
That is by all means bad advice in my opinion. If ever, the only appropriate situation to do this would be if you use it exactly once and not more. But what if you decide to use it more later on? What if suddenly you want to change the implementation (i.e. move from match to split)? You're setting yourself up for a big refactoring session that could've been as easy as changing one method.
that's some have-handed dogma for a scripting language! careful with words like ever and never around here...
i see what's you're saying, but not everything needs to be over-built or infinitely expandable. in-lining has performance advantages as well; eliminating function call overhead, lowering RAM usage and reducing the pressure on the garbage collector by bundling symbols into a single activation envelope.
along those lines, you can procedural-ize the chucks that use the counting code, so it's not a guarantee that you will need broad re-factoring to upgrade.
Quote:
Originally Posted by Airblader
Should it return the number of words in all array elements added up? Or an array of the number of words in each element? If the first one, does it concatenate the array before counting or count each element separately and add up (different results!)?
in any scenario, we can easily enough use the OP code to perform the counting in an array with minimal overhead, certainly far-less than a function/proto rewrite or additional methods:
on each elm:
Code:
arrWords.map(countWords)
or using the OP's proto:
Code:
r.map( eval.call.bind("".countWords))
if you want a whole array result:
Code:
arrWords.join(" ").countWords()
__________________ my site (updated 5/13) STATS (2013/5) HTML5:90.2% MOB:14% IE7:0.5% IE8:8.6% IE9:9.8% IE10:10%
that's some have-handed dogma for a scripting language!
How does it matter that it's a scripting language? There are a few general programming standards. One of which is: Don't duplicate code. We're not really gonna disagree on code duplication being a bad thing, are we?
Counting the words is a feature the language doesn't offer, hence we implement it. And functions are exactly the structure meant to wrap code we want to use multiple times.
Inlining such methods is something I would guess the compiler will do for us (although I definitely don't know it). For a change I'll be the one to say that the cost of potential refactoring work is higher than the fractions of milliseconds you'll lose calling a function.
Quote:
in any scenario, we can easily enough use the OP code to perform the counting in an array with minimal overhead, certainly far-less than a function/proto rewrite or additional methods:
Inlining such methods is something I would guess the compiler will do for us
That's where JavaScript being a scripting language makes a difference as there is no compiler to do such optimisations for us - just an interpreter that interprets statement by statement as they are run (which is why making minor changes to the way the script is written can change the run time of a script by a small amount whereas similar changes with a compiled language have no effect as the compiler provides the same executable by applying the optimisation itself if you don't).
Of course unless the code is running millions of times you will still not notice any significant difference in the run time unless you have the script recording it.
But that is only true "in theory". In reality, engines like V8 do compile JavaScript and they do a lot of optimization. Simply because JavaScript is an old language that gained so much popularity that it needs to run faster and faster.
So maybe from a theoretical point of view we wouldn't have a compiler to do that, but even then the performance difference is so subtle that inlining is not worth the potential problems it will bring (let alone my favorite argument "clean code"…). And thinking more practically these things will probably be optimized anyway.