Re: [css3-syntax] Thoughts on proposed Syntax module from Tab Atkins Jr. on 2012-08-29 (www-style@w3.org from August 2012)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Tue, 28 Aug 2012 23:24:36 -0700
To: "L. David Baron" <dbaron@dbaron.org>
Cc: "Kang-Hao (Kenny) Lu" <kanghaol@oupeng.com>, WWW Style <www-style@w3.org>
Message-ID: <CAAWBYDAo-dzLdV83e4OhJ-mZRwzgmGKuTsWCtOX+Ogzr8E-rdQ@mail.gmail.com>
On Tue, Aug 28, 2012 at 6:03 PM, L. David Baron <dbaron@dbaron.org> wrote:
> On Wednesday 2012-08-29 07:59 +0800, Kang-Hao (Kenny) Lu wrote:
>> (12/08/29 5:28), L. David Baron wrote:
>> > As I said at the face-to-face meeting, I think the approach that the
>> > specification takes to CSS's (), [], and {} matching rules is going
>> > in the wrong direction.  I think the normative specification text
>> > for these should be the general statements about how the processing
>> > works, and not the code-like form of the current specification,
>> > since I really *don't* want bugs in the specification that break the
>> > general rules to end up being codified in the specification.  I
>> > worry that ending up with exceptions to these rules could prevent us
>> > from making general improvements to parsing technology that would
>> > otherwise (without exceptions) be possible.  (For example, we might
>> > at some point in the future have generated parsers based on two
>> > different but parallel state machines, one describing the correct
>> > syntax and another describing the error handling behavior (for when
>> > the first state machine goes into a failure state) -- done in a way
>> > that a state in the error handling state machine can be determined
>> > at parser generation time from the state in the correct-input state
>> > machine.)
>>
>> It's a bit difficult to me to understand this concern without specific
>> examples of "bugs in the specification that break the general rules". Do
>> you mean, for example, error handling for BAD_URI should be undefined?
>
> No, I mean that that it's very easy to forget to write that it's
> time to look for a matching closing ), or closing }, when handling
> error cases.  (Example below, after your next quote.)
>
> Or, in Tab's spec that accepts almost anything without going into an
> error mode, it's easy to forget to put in special handling for a
> closing } or ).  (Note that this model of accepting almost anything is
> the opposite of the way parsing works in Gecko.  It also means that
> Tab's spec so far really only describes the Chapter 4 grammar, and not
> how parsing actually works for the different types of rules and the
> different types of values.  It doesn't unify the Chapter 4 grammar with
> how things actually get parsed, which I thought was one of the major
> goals of rewriting how we specify parsing.)

I haven't decided quite how to spec the handling of actual
property/rule grammars.  I'm thinking it should be handled at two
places:

1. When spotting an at-rule token or the name of a property, if it's
unrecognized, enter the appropriate error mode.
2. When completing a declaration or at-rule, check its value against
its grammar.  If it's invalid, throw it out.

Reasonable?  Or should we do something else?

> For example, Tab's spec produces incorrect results for the
> following:
>   @media screen { @foo bar } p { color: green }
> since it produces the data structure shown by the indentation:
>   @media screen {
>     @foo bar } p { color: green } /* an invalid @media rule */
>   } /* implied by EOF */
> instead of the data structure shown by the indentation:
>   @media screen {
>     @foo bar /* an invalid @media rule */
>   }
>   p { color: green }
> as required by CSS 2.1 (although I admit CSS 2.1 is only clear for
> the case of an unknown at-keyword, but not for a known at-keyword
> with invalid syntax following it; e.g., s/@foo bar/@media print/),
> since http://dev.w3.org/csswg/css3-syntax/#at-rule-mode0 doesn't
> have any special handling for the } token conditional on whether the
> at-rule being parsed is nested inside another rule.  (At the top
> level of a style sheet, on the other hand,
>   @foo bar } p { color: green }
> would be a single at-rule.)

Note that the specced behavior is what WebKit does.  ^_^

There are a few ways I could fix it.  If I add the above
error-handling rules for encountering an unknown at-rule, I can
immediately switch into an error-handling mode that handles that
properly.  (I had the sketch of this mode up before, but I ended up
not needing it.  I can add it back.)

>> > That said, I think the problems with this approach don't show up
>> > much in the material currently specified; I think most of the
>> > problems appear when describing how to parse the syntax of all the
>> > property values, which is where the bulk of CSS parsing logic lives.
>>
>> Hmm... I am intrigued. Do you have specific examples?
>
> For example, the entirety of parsing CSS media queries in @media
> rules happens inside the
>   # anything else
>   #   consume a primitive and append the returned value to the
>   #   prelude of the current rule.  Remain in this mode.
> in http://dev.w3.org/csswg/css3-syntax/#at-rule-mode0 .  So does the
> entirety of parsing media queries in @import rules.  So does the
> entirety of parsing the argument to an @supports rule.
>
> If how all of these things were parsed were actually defined in this
> specification, then it would be very easy to forget to mention that,
> when encountering an unexpected token (say, finding an opening '('
> where there should have been a ':') in the middle of a media query,
> that one needs to reprocess the current token and then enter an
> error mode that involves matching a ).  Or perhaps, if it's the
> containing mode that needs to do the () matching rather than the
> current mode, then the spec needs to say to return an error, return
> to the content mode, *and reconsume the current token* (which
> matters quite a bit if the unexpected token is a ')').

I'm also wondering if it might not be better to write the parser
solely as functions, a la the way "parse a simple block" or "parse a
function" are written.  That might make some of the error-handling
easier to specify, and would almost certainly make it simpler to
specify the parsing of cssText on various OM rules.

~TJ
Received on Wednesday, 29 August 2012 06:28:14 UTC