Re: [css3-syntax] First draft of parser section completed from Tab Atkins Jr. on 2012-06-12 (www-style@w3.org from June 2012)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Mon, 11 Jun 2012 18:12:57 -0700
To: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>
Cc: WWW Style <www-style@w3.org>
Message-ID: <CAAWBYDD1ZWjFj_C_LLYya_HE4QG+kqZLCLWHg=_+ivakhOgx5g@mail.gmail.com>
On Fri, Jun 8, 2012 at 9:01 PM, Kang-Hao (Kenny) Lu
<kennyluck@csail.mit.edu> wrote:
> (12/06/09 9:06), Tab Atkins Jr. wrote:
>> Please critique and tell me about any errors you find.
>
> == technical feedback ==
>
> 1. In "3.5.4 At-rule mode"
>
>  # open-bracket token
>  # open-brace token
>  #
>  # Consume a block with..
>
> I think you meant to say open-paren here.

Yup, fixed.

> 3. You seem to assume that bad-url doesn't open a "block". CSS 2.1 is a
> bit vague on this (it doesn't say a bad-url contributes to a unbalanced
> '(' or not), but since at least IE and Firefox implement this, this
> should be marked as an issue.

Ooh, I didn't think of that.  I had assumed that, like bad-string,
bad-url was self-contained.

Some testing shows that all browses require the bad-url to be closed
by a ) like normal.  As well, everyone but webkit pays attention to
blocks inside an unquoted url, even if they're in the "invalid"
section.

A quick search of our bugzilla revealed zero bugs about the
block-parsing thing, so for now I'm going to assume that it's okay to
do the simple thing and just handle this in the tokenizer, ignoring
any blocks that get opened in the meantime.


> 4. In "3.5.13. Consume a block"
>
>  # whitespace token
>  #
>  # Do nothing.
>
> If you do this, UA can't tell if "calc(1+1)" is different from "calc(1 +
> 1)", while the former is non-conforming. (Even if we end up allowing
> optional spaces in calc(), there's still "attr(ns|name)").

Ah, yeah, you're right. :/ It's not strictly required from a parsing
standpoint (if you didn't include some necessary whitespace, it would
have tokenized differently), but simpler rules for humans translate
into slightly more complication on my side.  I'll preserve whitespace
tokens, then.


> 5. In "3.5.11. Next-statement error mode", the '}' token should do the
> same thing as it does in Declaration-value mode.

Thanks, I think I've handled that now.  Forced the error mode to
recognize it, switch to the right mode, then reprocess, and then added
rules I was missing in at-rule-block and declaration modes to handle a
close-brace token.


> 6*. According to CSS 2.1, the ';' token triggers a "parse error" in the
> Top-level mode, Style-rule mode, Declaration-value mode if it is the
> first token.

Fixed.


> 7*. According to CSS 2.1, the '}' token triggers a "parse error" if it
> is the first token in the Declaration-value mode

I'm not sure precisely how to decipher what CSS 2.1 wants us to do in
this case (and with semicolon as first token in declaration-value, but
browsers interoperably just drop the declaration.  It's currently
undetectable whether this is because it's considered an overall
violation of the Core Grammar, or because the empty value doesn't
match any property's grammar.  I'm going with the latter for now,
because it leaves the door open for the empty value for Variables.  I
can change it if anyone feels strongly about it.


> == non-technical feedback ==
>
> 1. Fundamentally, I don't see a value of the "parse error" idea. It's
> not like this error class has been stable throughout the history of CSS.
> (e.g. "({})" went from an valid input to an "any"[1]), and css3-syntax
> seems to want to change this again (the parts marked * above). Also,
> it'll be very limited in terms of its usefulness since parsing of
> specific parts like the selector would have a stricter syntax.

I haven't yet filled in what it means to encounter a parse error, but
the short answer is "nothing".  Just like in HTML, it exists to let
validators know where they should flag errors.  The actual effect of
such an error is explicitly defined by the rest of the instructions
for a particular mode.


> If we choose to drop the "parse error", a lot of of branches in the
> state machine can just merge into "anything else" and make some parts a
> lot readable.
>
> [1] http://lists.w3.org/Archives/Public/www-style/2010Aug/0435

Overall I'm fine with loosening some of the restrictions, such as the
"unused" production cited in that email.  But I'd like to start by
just transforming the current spec and fixing it to match reality when
necessary.


> 2. Instead of describing open-* (or close-*) token, it might be more
> readable just to use the character literally like '(', '{' and so on.
> Also, it might be more readable to fold, say, open-* in three lines in
> to a single line: '(', '{', '['.
>
> Note that I use "might" here and I am not sure.

Hm, maybe.  I've gone ahead and committed a change using the literal
characters.  It's easy to change them back if we decide they're worse.
 Let me know what you think.


>> I'm also interested in feedback about Issue 4, regarding how to
>> specify the parser around at-rule block bodies.  What's the most
>> useful way for me to specify that section, for someone implementing a
>> CSS parser?
>
> I think in general at-rule-block parsing would just be Top-level parsing
> with a special flag that says '}' ends the Top-level mode. It's likely
> to be what's going on with a non-machine generated parer (similary,
> @style value parsing will be "Declaration mode" parsing *without* such a
> mode).

This is impossible - some at-rules allow declarations inside of them.
Generally, we have to split at-rules into two camps - those whose
insides are like top-level mode, and those whose insides are like
declaration mode.

~TJ
Received on Tuesday, 12 June 2012 01:13:47 UTC