Re: [css3-syntax] Parser "entry points" from Tab Atkins Jr. on 2013-01-28 (www-style@w3.org from January 2013)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Mon, 28 Jan 2013 08:18:29 -0800
To: Simon Sapin <simon.sapin@kozea.fr>
Cc: www-style list <www-style@w3.org>
Message-ID: <CAAWBYDBK3gmt0E64dDbF6N40c6pEYERSZny2ENjpN4LJioNoGw@mail.gmail.com>
On Mon, Jan 28, 2013 at 4:20 AM, Simon Sapin <simon.sapin@kozea.fr> wrote:
> Le 28/01/2013 07:38, Tab Atkins Jr. a écrit :
>> On Sun, Jan 27, 2013 at 8:36 AM, Simon Sapin <simon.sapin@kozea.fr> wrote:
>>>
>>> A style attribute is like a declaration block where a top-level } token
>>> does
>>> not terminate the block but is treated like a ] token (that is: make the
>>> current declaration invalid and make the parser skip until the next ;
>>> token
>>> or EOF.) Start in declaration-block mode, and these sections need to be
>>> adjusted:
>>>
>>> * 3.5.7. Declaration-block mode
>>> * 3.5.9. Declaration-value mode
>>> * 3.5.11. Declaration-end mode
>>
>>
>> Hrm, okay.  My plan was to just start the parser at a particular
>> state, and end it when the rule got popped or the current declaration
>> was appended or unset, but I guess I do have to explicitly change the
>> parser for a few states.
>>
>> I wonder if impls would be okay with changing behavior to end the rule
>> after } and ignore anything that comes after?  That would simplify the
>> parser slightly. ^_^
>
>
> Oh well. No interop, again.
>
> data:text/html,<p style="color:red;};color:green">test
>
> Green in Firefox and IE, red in Chrome and Opera.
>
> http://www.w3.org/TR/css-style-attr/ has an informative note:
>>
>> Note that because there is no open brace delimiting the declaration
>> list in the CSS style attribute syntax, a close brace (}) in the
>> style attribute's value does not terminate the style data: it is
>> merely an invalid token.
>
> But if you only look at normative text, I think this is undefined because
> the above style attr does not match the CSS 2.1 core grammar at all.

Right.  WebKit parses it by just wrapping it in, iirc, "@-webkit-rule
{}" and parsing it as a stylesheet, then extracting the resulting
declarations.  That's why we stop on the } - it looks like it's
closing the at-rule.

I know we just got two impls agreeing on this, which let us advance
the Style Attr spec, but still. :/  It's not a hard change to the
parser, it just's the only thing I know of so far that varies based on
entry point. (But see below, I guess.)

>>> Similarly, for a single declaration, a ; token does not end the
>>> declaration.
>>
>> What do you mean by "does not end the declaration"?  It looks like
>> top-level ; tokens aren't allowed in @supports conditions, and I don't
>> see how they'd be allowed anywhere else that wants to take a single
>> decl in the future.  I'd prefer to just say that it's a syntax error
>> if the decl is appended or unset before the token stream is fully
>> consumed.
>
> Ok, that would work too. But it’s still different from "append the
> declaration to the current rule and switch …" etc, so the state machine
> still has to be adapted.

Yeah, you're right, it would need a parser change to work well.  Darn,
that's two instances, which makes it worthwhile to add the change.

>>> Finally, anything that does not have error recovery is the same with
>>> respect
>>> to the Syntax module. It needs a new mode that repeatedly "consumes a
>>> primitive" until EOF, and maybe fails at the first non-preserved token.
>>> Parsing these primitives further is up to other modules.
>>
>> Can you explain what you mean by this in more detail?
>
> Let’s take selectors for example. In the output of Tree Construction for a
> whole stylesheets, selectors are represented as lists of primitives.
> Hopefully, the syntax in selectors4 will be defined in terms of such
> primitives rather than have its own tokenizer.

It already is.  No spec has ever tried to redo tokenization.

> So, to parse a stand-alone selector from querySelectorAll() you would need
> to use the css3-syntax tokenizer, then something like this:
>
>> This section describes how to consume a list of primitives.
>>
>> Create a list that is initially empty.
>>
>> Repeatedly consume the next input token and process it as follows:
>>
>> EOF token
>>     Return the list.
>> anything else
>>     Consume a primitive and append the returned value to the list.
>
>
> … and then pass the result to the selector parser.
>
> If "consume a primitive" is changed to return a failure on non-preserved
> tokens (see [1] and [2]) then that needs to be handled in "consume a list of
> primitives" somehow.
>
> [1] http://lists.w3.org/Archives/Public/www-style/2012Nov/0401.html
> [2] http://lists.w3.org/Archives/Public/www-style/2013Jan/0263.html

Ah, okay, that all makes sense.

> Error handling in selectors is easy: the whole selector list is invalid. I’m
> not sure about media queries…
>
> data:text/html,<style>@media ], all{body{background:green
>
> (Green in Firefox, Opera and IE, not in Chrome.)

Chrome's wrong here - a syntax error in a MQ list just falsifies the
MQ, but leaves the rest of the list alone.

I suppose I can do another parsing function, in addition to the "list
of primitives" one you outline above, which is more similar to
function parsing: break the list by top-level commas, and the value of
each entry is either a list of primitives or a syntax error.

~TJ
Received on Monday, 28 January 2013 16:19:20 UTC