Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Page 2 of 2 FirstFirst 12
Results 16 to 21 of 21
  1. #16
    Regular Coder
    Join Date
    Mar 2006
    Posts
    726
    Thanks
    35
    Thanked 132 Times in 123 Posts
    Code:
    (function(){
    	window.Words={
    		collect: function(str, ret){
    			if(!str) str= document.body;
    			if(str.nodeType== 1) str= Words.from(str).join(' ');
    			str= str.match(/(^| )[a-zA-Z]+\b/g);
    			var w, indx, A= [], B=[], glossary= {};
    			B[0]= str.length+' words found';
    			while(str.length){
    				w= str.shift().replace(/^ +/,'');
    				if(w.length== 1 && /[^aAI]/.test(w)) continue;
    				w= w.charAt(0).toUpperCase()+ w.substring(1);
    				if(!glossary[w]) glossary[w]= 0;
    				++glossary[w];
    			}
    			for(var p in glossary){
    				indx= glossary[p];
    				if(!A[indx]) A[indx]= [];
    				A[indx].push(p);
    			}
    			if(!ret) return A;
    			return Words.count(A, B);
    		},
    		count: function(A, B){
    			var n= 0;
    			for(var i= 0, L= A.length; i<L; i++){
    				if(A[i]){
    					B.push(i+' instances: '+A[i].sort().join(', '));
    					n+= A[i].length;
    				}
    			}
    			B[0]= n+ ' unique words from '+B[0];
    			if(B[1]) B[1]= B[1].replace('1 instances: ','1 instance: ');
    			B.push(B.shift());
    			return B.reverse().join('\n\n');
    		},
    		from: function(hoo){
    			var A= [], tem;
    			if(hoo){
    				hoo= hoo.firstChild;
    				while(hoo!= null){
    					if(hoo.nodeType== 3){
    						if(hoo.data) A[A.length]= hoo.data;
    					}
    					else A= A.concat(arguments.callee(hoo));
    					hoo= hoo.nextSibling;
    				}
    			}
    			return A;
    		}
    	}
    })()
    //test
    alert(Words.collect(document.body,true));

    /* first parameter: (optional, defaults to document.body) -
    any text string or an element in the current document.

    second parameter: (optional, defaults to false)-
    is a boolean, true returns a string listing the words in groups of frequency, most frequent first.
    false (or no second parameter) returns an array whose members are arrays of words.
    A[3] would be an array of words found 3 times in the source.
    */
    //test
    var W= Words.list(document.body).sort();
    alert(W.join(', '));
    alert(Words.count(document.body));
    Last edited by mrhoo; 08-05-2009 at 03:30 PM.

  2. #17
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,028
    Thanks
    75
    Thanked 4,325 Times in 4,291 Posts
    Oh, what the heck. If everybody else wants to contribute answers without knowing the full parameters of the question, here's my version.

    I know, it's a pretty old-fashioned way of coding. Very simplistic. Very straightforward. Even has (shudder!) comments in it.

    What can I say? I'm a Luddite at heart.
    Code:
    <html>
    <head>
    <script type="text/javascript">
    function processText( )
    {
        var text = document.body.innerHTML;
        // strip out HTML tags, replace with space
        var reHTML = /\<\/?[a-z][^\>]*\>/ig;
        text = text.replace(reHTML," ");
        // strip out non-letters, replace with space
        var reZap = /[^a-z]/ig;
        text = text.replace(reZap," ");
        // convert all multiple non-letters (including newlines) to single space:
        var reSp = /[^a-z]+/ig;
        text = text.replace(reSp," ");
        // now get all the words:
        var all = text.split(/ /g);
        // and process them:
        // we put the words into an array, indexed by the actual word
        var words = new Array();
        for ( var w = 0; w < all.length; ++w )
        {
            var word = all[w].toLowerCase(); // lower case is optional of course
            if ( words[word] == null )
            {
                words[word] = 1; // if word not already in array, put it there w/ count of 1
            } else {
                ++words[word]; // otherwise, bump the count for that word
            }
        }
        // now create output that we'll stuff in the page display:
        var list = ""; // list of all words...unsophisticated output
        for ( word in words )
        {
            list += word + " [" + words[word] + "]<br/>";
        }
        // and K.I.S.S.:
        document.getElementById("putItHere").innerHTML = list;
        
    }
    </script>
    </head>
    <body onload="processText();">
    <div style="color: red;">This is a sample that <b>uses tags</b> and has 
    abbreviations such as U.S.A. and Dr. Jones.<br/><br/>
    It also has sentences that end in a period.<br/>
    And it has numbers such as $4.98 and 365 days.
    </div>
    <hr>
    <div id="putItHere"></div>
    </body>
    </html>
    p.s.: Did I mention that it's also not very much code??

  3. #18
    Supreme Master coder! Philip M's Avatar
    Join Date
    Jun 2002
    Location
    London, England
    Posts
    17,898
    Thanks
    203
    Thanked 2,530 Times in 2,508 Posts
    But numbers 365 days and $4.98 are ignored, and U.S.A is counted as u[1] s[1] and a[1].

    You can improve with:-

    var reZap = /[^a-z0-9\.]/ig;
    text = text.replace(reZap," ");
    var reSp = /[^a-z0-9\.]+/ig;
    text = text.replace(reSp," ");
    text= text.replace(/([a-z])\./g,"$1"); // remove periods after lowercase a-z


    What is the point of this anyway? What useful purpose is there in knowing how frequently each word is to be found on a HTML page?
    Last edited by Philip M; 08-05-2009 at 07:48 AM.

  4. #19
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,028
    Thanks
    75
    Thanked 4,325 Times in 4,291 Posts
    But numbers 365 days and $4.98 are ignored, and U.S.A is counted as u[1] s[1] and a[1].
    Philip: That was my point *EXACTLY*!

    Every has been giving answers to this question and the original poster *STILL* hasn't responded with *EXACT* specifications!!!

    I'd happily handle U.S.A. as a word, with or without the periods. I'd happily include numbers or even currency as words. BUT HE WON'T TELL US what he wants!

    So since everybody else was jumping in with answers that worked in whatever way they fancied, I figured I could do the same.

    Oh...and as for the purpose: Who knows? Want to bet it's just for a homework exercise??? In fact, that might explain why the original poster *CAN'T* explain exactly what he wants, since he doesn't know.

    Hey, at least my code was simple and easy to implement. And small.

  5. #20
    Supreme Master coder! Philip M's Avatar
    Join Date
    Jun 2002
    Location
    London, England
    Posts
    17,898
    Thanks
    203
    Thanked 2,530 Times in 2,508 Posts
    This thread is a lovely example to illustrate the quote:-

    “Fools act on imagination without knowledge, pedants act on knowledge without imagination” Alfred North Whitehead (British Mathematician and Philosopher, 1861-1947)

    You are right, and I am indeed a fool, but you are indeed a pedant.

  6. #21
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,028
    Thanks
    75
    Thanked 4,325 Times in 4,291 Posts
    ROTFLMAO! I *LOVE* that quote!

    I hope it's only *partly* true about myself, but...


 
Page 2 of 2 FirstFirst 12

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •