- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Wed, 11 Apr 2012 22:36:53 -0700
- To: www-style list <www-style@w3.org>
Some additional technical details about the tokenizer that may be of interest. The tokenizer uses 3 characters of lookahead. It is *almost* stateless - if you implement it as a scanner that emits one token per invocation, the only state it has to keep track of is a single "reconsume" character and its current index into the bytestream. It always returns to the "data state" after emitting a token, so the parsing algorithm can begin anew. (If you're okay with it sometimes returning multiple tokens, you can even drop the "reconsume" character.) ~TJ
Received on Thursday, 12 April 2012 05:37:48 UTC