Re: [css3-syntax] Reviving the spec, starting with the parser from Tab Atkins Jr. on 2012-04-12 (www-style@w3.org from April 2012)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Wed, 11 Apr 2012 22:36:53 -0700
To: www-style list <www-style@w3.org>
Message-ID: <CAAWBYDDOFp1O+dUSYsFSqmHBrBa5AYNSEudk3yOXdJKYEJhe4Q@mail.gmail.com>

Some additional technical details about the tokenizer that may be of interest.

The tokenizer uses 3 characters of lookahead.

It is *almost* stateless - if you implement it as a scanner that emits
one token per invocation, the only state it has to keep track of is a
single "reconsume" character and its current index into the
bytestream.  It always returns to the "data state" after emitting a
token, so the parsing algorithm can begin anew.  (If you're okay with
it sometimes returning multiple tokens, you can even drop the
"reconsume" character.)

~TJ

Received on Thursday, 12 April 2012 05:37:48 UTC