Re: [css3-syntax] Parser "entry points"

Le 28/01/2013 17:18, Tab Atkins Jr. a écrit :
> On Mon, Jan 28, 2013 at 4:20 AM, Simon Sapin <simon.sapin@kozea.fr> wrote:
>> data:text/html,<p style="color:red;};color:green">test
>>
>> Green in Firefox and IE, red in Chrome and Opera.
>>
>> [...]
>
> Right.  WebKit parses it by just wrapping it in, iirc, "@-webkit-rule
> {}" and parsing it as a stylesheet, then extracting the resulting
> declarations.  That's why we stop on the } - it looks like it's
> closing the at-rule.
>
> I know we just got two impls agreeing on this, which let us advance
> the Style Attr spec, but still. :/  It's not a hard change to the
> parser, it just's the only thing I know of so far that varies based on
> entry point. (But see below, I guess.)

I don’t think stopping or not stopping at } in a style attr is compat 
problem either way. Not stopping makes more sense IMO (there is no 
matching { token) but I would not object to stopping.


>>>> Similarly, for a single declaration, a ; token does not end the
>>>> declaration.
>>>
>>> What do you mean by "does not end the declaration"?  It looks like
>>> top-level ; tokens aren't allowed in @supports conditions, and I don't
>>> see how they'd be allowed anywhere else that wants to take a single
>>> decl in the future.  I'd prefer to just say that it's a syntax error
>>> if the decl is appended or unset before the token stream is fully
>>> consumed.
>>
>> Ok, that would work too. But it’s still different from "append the
>> declaration to the current rule and switch …" etc, so the state machine
>> still has to be adapted.
>
> Yeah, you're right, it would need a parser change to work well.  Darn,
> that's two instances, which makes it worthwhile to add the change.

Or, as I said in another branch of the thread, just redefine "a single 
declaration that can not be made important" separately. It’s pretty simple:

     ws* ident ws* ':' (any primitive except delim(!))*

Or maybe:

     ws* ident ws* ':' (any primitive except delim(!))+

Grammar mismatch means invalid declaration, no error recovery needed. 
(It could be written as a state machine if you prefer that to a grammar.)


>> Hopefully, the syntax in selectors4 will be defined in terms of such
>> primitives rather than have its own tokenizer.
>
> It already is.  No spec has ever tried to redo tokenization.

http://www.w3.org/TR/css3-selectors/#lex defines a tokenizer that is not 
quite the same as CSS 2.1. It has no delim or unicode-range tokens but 
has PREFIXMATCH, COMMA, and a few others.


>> Error handling in selectors is easy: the whole selector list is invalid. I’m
>> not sure about media queries…
>>
>> data:text/html,<style>@media ], all{body{background:green
>>
>> (Green in Firefox, Opera and IE, not in Chrome.)
>
> Chrome's wrong here - a syntax error in a MQ list just falsifies the
> MQ, but leaves the rest of the list alone.
>
> I suppose I can do another parsing function, in addition to the "list
> of primitives" one you outline above, which is more similar to
> function parsing: break the list by top-level commas, and the value of
> each entry is either a list of primitives or a syntax error.

I don’t really like it, but another option is to make bad-string and 
bad-url preserved tokens so that ( { [ and function are the only 
non-preserved tokens, and "consume a primitive" never fails.

It’s up to selectors, MQs, etc. to define their error handling for 
tokens such as bad-string, ] or cdo.

Maybe it’s not such a bad idea after all.

-- 
Simon Sapin

Received on Monday, 28 January 2013 16:56:43 UTC