Re: [css3-syntax] First draft of parser section completed from Tab Atkins Jr. on 2012-06-13 (www-style@w3.org from June 2012)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Tue, 12 Jun 2012 17:57:26 -0700
To: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>
Cc: WWW Style <www-style@w3.org>
Message-ID: <CAAWBYDCA=1drVwDnZuyM=is8dhPeK3_QgHZZmSyJCrkQMw6LWQ@mail.gmail.com>
On Mon, Jun 11, 2012 at 11:15 PM, Kang-Hao (Kenny) Lu
<kennyluck@csail.mit.edu> wrote:
> (12/06/12 9:12), Tab Atkins Jr. wrote:
>> On Fri, Jun 8, 2012 at 9:01 PM, Kang-Hao (Kenny) Lu
>>> 3. You seem to assume that bad-url doesn't open a "block". CSS 2.1 is a
>>> bit vague on this (it doesn't say a bad-url contributes to a unbalanced
>>> '(' or not), but since at least IE and Firefox implement this, this
>>> should be marked as an issue.
>>
>> [snip]
>>
>> A quick search of our bugzilla revealed zero bugs about the
>> block-parsing thing, so for now I'm going to assume that it's okay to
>> do the simple thing and just handle this in the tokenizer, ignoring
>> any blocks that get opened in the meantime.
>
> That's quite interesting. So if I understand your change correctly,
> equivalently you are appending something like {baduritail}, which is
>
>  ([^\)\\]|{escape}|\\{nl})*(\)|\\)?
>
> to each of {baduri1}, {baduri2} and {baduri3}, in the flex grammar in
> CSS2.1 right? (Also, the position of BAD_URI and URI have to be switched
> so that a valid URI won't be caught in this regexp.)
>
> I don't have an opinion about this for now, but again, it would be nice
> if css3-syntax has a list of things that are changed since CSS 2.1 so
> that for people who have read the flex grmmar, we don't have to read all
> the states to see what's changed. This growing list, as far as tell has
>
>  * DASHMATCH and INCLUDES are gone.
>  * BAD_URI is changed to be self-contained (including the ')').

I've started a list with these.  Given that several of the changes you
noticed were accidental, the list is definitely necessary. ^_^

>  * a string ending with an EOF following backslash is now a STRING
>    instead of a BAD_STRING.

This wasn't intentional; I just didn't test this case.  There's no
real interop, but chrome and FF both seem to treat it as a bad-string,
so I've gone ahead and done so.

>  * BAD_STRING now contains the trailing newline character that makes
>    it a BAD_STRING.

This isn't an intentional difference. Fixed.

>  * a new line following a backslash changed from a DILIM S to just a
>    DEMIM.

I'm not sure what this is supposed to mean.  As far as I can tell, a
backslash followed by a newline emits a DELIM WHITESPACE in Syntax.


> Some error picking with regard to BAD_URI states:
>
> 1. In "URL-unquoted state", a newline following the backslash should
> switch the state to "Bad-URL state".

Fixed.

> 2. In "URL-end state",
>
>  # anything else
>  #
>  # This is a parse error. Switch to the bad-url state.
>
> should be
>
>  | This is a parse error. Switch to the bad-url state. Reconsume the
>  | current input character.
>
> for a case like "url(a \) )"

Fixed.


>>> 7*. According to CSS 2.1, the '}' token triggers a "parse error" if it
>>> is the first token in the Declaration-value mode
>>
>> I'm not sure precisely how to decipher what CSS 2.1 wants us to do in
>> this case (and with semicolon as first token in declaration-value, but
>> browsers interoperably just drop the declaration.  It's currently
>> undetectable whether this is because it's considered an overall
>> violation of the Core Grammar, or because the empty value doesn't
>> match any property's grammar.  I'm going with the latter for now,
>> because it leaves the door open for the empty value for Variables.  I
>> can change it if anyone feels strongly about it.
>
> I only feel strongly that we should document the difference between
> "Parse Error" and the CSS 2.1 "Core Grammar", so for whoever implements
> this grammar (e.g. tinycss) this is still trackable. Or otherwise, the
> simplest thing, as I've been saying, is to drop the "Parse Error"
> concept at all (and say the difference is that "Core Grammar" is just
> obsoleted).

The intention of this is that the Core Grammar is obsoleted.  As I've
expressed before, the "parse error" concept is solely a hint for
validators to give feedback to authors, just like in HTML.


>>> == non-technical feedback ==
>>>
>>> If we choose to drop the "parse error", a lot of of branches in the
>>> state machine can just merge into "anything else" and make some parts a
>>> lot readable.
>>>
>>> [1] http://lists.w3.org/Archives/Public/www-style/2010Aug/0435
>>
>> Overall I'm fine with loosening some of the restrictions, such as the
>> "unused" production cited in that email.  But I'd like to start by
>> just transforming the current spec and fixing it to match reality when
>> necessary.
>
> Just to make it clear, "unused" is allowed in a block in CSS 2.1. I am
> not asking to make it loose.
>
> The reality is that browsers don't implement the "Parse Error" thing. It
> is just going to be quite confusing if a Web Console, when encountering
> "width: <!--", says "violation of the core grammar in the value of
> 'width'" instead of just "unrecognized value in 'width'".

It's still perfectly fine for a validator to say that.

~TJ
Received on Wednesday, 13 June 2012 00:58:15 UTC