Re: [css3-syntax] Preference for parser speccing?

On Thu, May 31, 2012 at 12:43 PM, Kang-Hao (Kenny) Lu
<kennyluck@csail.mit.edu> wrote:
> (12/05/26 6:08), Tab Atkins Jr. wrote:
>> This is just a general preference question for implementors.
>>
>> When I pick up Syntax again, would it be better for me to write the
>> parsing section as if it the tokenizing was already completely done,
>> or interleaved with the tokenizing like the HTML parser is?  What
>> would be more useful?  I can do either, but I'd rather not have to
>> switch partway through, or even after I'm totally done.
>
> Mind sharing an example about the choices here? Or is this a general
> survey about how we should write the parser section?
>
> (12/05/26 6:23), Tab Atkins Jr. wrote:
>> HTML, for example, is defined with the tokenizing interleaved with the
>> parsing.  This is necessary for HTML, because some tags change the
>> tokenizing rules - if you see a <script>, you stop parsing like HTML
>> and instead just parse everything as text until you see the </script>,
>
> Note that when you say "interleaved" here, the tokenizing section of the
> HTML spec uses phrases like "Emit the XXX token" instead of "call parser
> routine YYY" so it's still quite clean to me (though whether that will
> make people think that the parser wouldn't change the state of the
> tokenizer is another issue).

Note that HTML explicitly defines that the parser must be reentered
after every emitted token.  It doesn't need any explicit callbacks,
because it's a general rule.

> It is true that the HTML parser changes the state of the tokenizer from
> time to time during tree construction, but given that the CSS parser
> mostly (except :nth-*) doesn't change the state of the tokenizer, I
> can't quite imagine how you could write this in an interleaved way
> without a concrete example...

Same as HTML, basically, where the parser explicitly says "hey,
tokenizer, give me one more token using the XXX tokenizing mode".

~TJ

Received on Thursday, 31 May 2012 20:29:35 UTC