Re: [css3-syntax] Preference for parser speccing?

On Fri, May 25, 2012 at 3:14 PM, Sylvain Galineau
<sylvaing@microsoft.com> wrote:
> [Tab Atkins Jr.:]
>> This is just a general preference question for implementors.
>>
>> When I pick up Syntax again, would it be better for me to write the
>> parsing section as if it the tokenizing was already completely done, or
>> interleaved with the tokenizing like the HTML parser is?  What would be
>> more useful?  I can do either, but I'd rather not have to switch partway
>> through, or even after I'm totally done.
>
> Not sure what you're asking. I think I'd rather implement a parsing algorithm
> based on a clear and unambiguous definition of what the tokens actually are.
> Recent threads suggest the latter may need some polishing?

Tokens are easy - the tokenizer is already done in Syntax.  The
question is what to do with the parsing step, which turns tokens into
actual CSS structures and values.

HTML, for example, is defined with the tokenizing interleaved with the
parsing.  This is necessary for HTML, because some tags change the
tokenizing rules - if you see a <script>, you stop parsing like HTML
and instead just parse everything as text until you see the </script>,
because what's betweent hose tags simply isn't HTML, and you don't
want to risk screwing it up by parsing as HTML.

CSS *does* technically have this in one circumstance - the an+b syntax
of all the :nth-*() pseudos doesn't directly correspond to the tokens
of CSS.  However, it's possible to solve this by "reversing" the
tokens into something more meaningful, so you don't technically have
to switch contexts.

So the question is simply, as someone implementing or maintaining a
parser, which style is more useful to read?

~TJ

Received on Friday, 25 May 2012 22:24:23 UTC