Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 2 of 2
  1. #1
    New Coder
    Join Date
    Sep 2010
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Index of Words by a tiny stand-alone js file.

    jayakodiu@yahoo.com

    Index of Words by a tiny stand-alone js file

    The Index of Words in a large document is needed for quick reference and meaningful search as the search utility available in MS Word, browsers or text editors requires the word to be 'known or guessed'. Compiling the index of words is a tough task of sorting, filtering and removal of duplicates.

    The code uses the fact that a word in the Index is a plain text; hence, save the document - MS Word or HTML as a plain text file and use it. RegExp isolates the words and indexing is done only for words > 4 chars to filter out common words.

    A MS Word doc of 135 pages, 49669 words and of size 493Kb, when saved as plain text is of size 306Kb, 92 pages and 48839 words, reduced to 23010 removing words <= 4 char and after removing duplicates, the final Index is with 2513 words; the process time is less than a second.

    Indexor.js - How to use:
    Save the following code as js file after giving the names of the file to process & Indexfile to save in the first two lines; do not alter the rest of the code; run the js file from the Windows Explorer. The saved index file can be put into a SELECT tag or TEXTAREA for use in a html page. A html file using such Index may be seen at:
    Index_and_3waySearch

    <code>
    var fln="c:/fca.txt"
    var ifn="c:/MyIndex.txt"

    var zz="",jk=0,zb="",zc="\x0D"+"\x0A",f,zt,wrds,fso=WScript.CreateObject("Scripting.FileSystemObject")
    f=fso.OpenTextFile(fln,1)
    zt=f.ReadAll().toLowerCase();f="";wrds=zt.match(/[a-z]+/ig)
    wn=wrds.length;wrds.sort();zb=wrds[0];if(zb.length>4){zz=zt+zc+zz}
    for(i=1;i<wn;i++){var zt=wrds[i];if(zt.length>4 && zt != zb){jk++;zz=zz+zt+zc;zb=zt}}
    f=fso.CreateTextFile(ifn,2,1);f.Write(zz);f.close();f="";fso=""
    WScript.Echo("Total words: "+wn.toString()+"; indexed words > 4 chars: "+jk.toString()+"\nIndex file saved as "+ifn)
    </code>

  • #2
    New to the CF scene
    Join Date
    May 2014
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Very nice! You should use the BB "code" tags to format your code though ;-)


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •