Re: [css3-syntax] Reviving the spec, starting with the parser

Some additional technical details about the tokenizer that may be of interest.

The tokenizer uses 3 characters of lookahead.

It is *almost* stateless - if you implement it as a scanner that emits
one token per invocation, the only state it has to keep track of is a
single "reconsume" character and its current index into the
bytestream.  It always returns to the "data state" after emitting a
token, so the parsing algorithm can begin anew.  (If you're okay with
it sometimes returning multiple tokens, you can even drop the
"reconsume" character.)

~TJ

Received on Thursday, 12 April 2012 05:37:48 UTC