Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 3 of 3
  1. #1
    Regular Coder
    Join Date
    Sep 2007
    Location
    AZ, USA
    Posts
    685
    Thanks
    6
    Thanked 46 Times in 46 Posts

    Tokenizer Recursion Problem

    I am interested in writing a tokenizer (and then a parser) for JavaScript, in JavaScript. I must admit, when I started (on a whim) I didn't know anything about parsers; I didn't even know what a tokenizer was at the time. I just winged it, learned all I could from various sites, and came up with a 500 line tokenizer. It recognizes strings, numbers (including hexadecimal, octal, floating point, and integer), comments, punctuators, keywords, identifyers, and also undefined and null. I didn't get around to distinguishing regex literals from division/division assignment, because I ran into a problem.

    My tokenizer loops through every character in the string, and, depending on the character, and a few following it, narrows down the choices of token-types and then writes the token type and the value to an array. This works perfectly - until you get up to thousands of characters. I took the 500 lines of the tokenizer code itself and put into the tokenizer, and discovered that in Firefox it froze up and then stopped shortly after 7000 characters (there were 20,000 characters in the document). In Netscape, it stopped after 2000 chars. In Opera, it stopped at 18,000 chars. IE didn't work at all (can't say I'm surprised).

    The only success was Google Chrome. It really showed the power of the V8 engine; the moment I told it to tokenize, all 20,000 characters were immediately tokenized. But even Chrome could only handle up to 37,000 chars.

    My question is this: How could I cut down on the recursion? My problem was that I had way to much recursion - browsers will automatically stop after a certain amount. Is there any way to do the same thing with less? Basically, how are tokenizers usually made?

    Any link, article, comment, insight, or help of any kind is much appreciated!
    Last edited by binaryWeapon; 10-12-2008 at 12:37 AM.

  • #2
    Senior Coder jmrker's Avatar
    Join Date
    Aug 2006
    Location
    FL
    Posts
    3,026
    Thanks
    36
    Thanked 494 Times in 488 Posts

    Smile Willing to share information I have ...

    I tried something similar and posted a question to another forum.
    Never got any action on it, but you are welcome to see if it would help you.
    See: http://www.webdeveloper.com/forum/sh...ight=tokenizer

    I found a great explanation of compilers, parsers and tokenizers at: http://home.iae.nl/users/mhx/crenshaw/tiny.html

    I also had some test code, but it may take awhile to find it again.
    I was trying to create a programable RPN calculator with Javascript.
    Went pretty well until I came to 'if...then...else...elseif' and other
    conditional logic statements.

    Other sites of interest I have found are:
    http://loveandtheft.org/2008/03/20/w...kenizerparser/
    http://www.bluishcoder.co.nz/2008/06...th-factor.html
    http://flesler.blogspot.com/2008/03/...avascript.html
    http://tclab.kaist.ac.kr/~otfried/cs...calculator.pdf
    http://www.bradrodriguez.com/papers/bnfparse.htm

    While looking for more information I 'googled' using terms such as
    tiny, compiler, assembler, tokens, tokenizer, parser
    in various combinations.

  • #3
    Regular Coder
    Join Date
    Sep 2007
    Location
    AZ, USA
    Posts
    685
    Thanks
    6
    Thanked 46 Times in 46 Posts
    Turns out I posted too soon. After a night of rest I immediately realized that I was using tail-recursion - basically just recursion that does the same thing as a loop. I don't know why I didn't do this before - I just used a loop instead of recursion. I was also able to use named continue statements to get around the fact that I need to continue an outer loop and not an inner loop.

    That being said, I really appreciate the post, jmrk - the sites were very useful.

    Cheers!


  •  

    Tags for this Thread

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •