- From: L. David Baron <dbaron@dbaron.org>
- Date: Tue, 28 Aug 2012 18:03:25 -0700
- To: "Kang-Hao (Kenny) Lu" <kanghaol@oupeng.com>
- Cc: WWW Style <www-style@w3.org>
On Wednesday 2012-08-29 07:59 +0800, Kang-Hao (Kenny) Lu wrote: > (12/08/29 5:28), L. David Baron wrote: > > As I said at the face-to-face meeting, I think the approach that the > > specification takes to CSS's (), [], and {} matching rules is going > > in the wrong direction. I think the normative specification text > > for these should be the general statements about how the processing > > works, and not the code-like form of the current specification, > > since I really *don't* want bugs in the specification that break the > > general rules to end up being codified in the specification. I > > worry that ending up with exceptions to these rules could prevent us > > from making general improvements to parsing technology that would > > otherwise (without exceptions) be possible. (For example, we might > > at some point in the future have generated parsers based on two > > different but parallel state machines, one describing the correct > > syntax and another describing the error handling behavior (for when > > the first state machine goes into a failure state) -- done in a way > > that a state in the error handling state machine can be determined > > at parser generation time from the state in the correct-input state > > machine.) > > It's a bit difficult to me to understand this concern without specific > examples of "bugs in the specification that break the general rules". Do > you mean, for example, error handling for BAD_URI should be undefined? No, I mean that that it's very easy to forget to write that it's time to look for a matching closing ), or closing }, when handling error cases. (Example below, after your next quote.) Or, in Tab's spec that accepts almost anything without going into an error mode, it's easy to forget to put in special handling for a closing } or ). (Note that this model of accepting almost anything is the opposite of the way parsing works in Gecko. It also means that Tab's spec so far really only describes the Chapter 4 grammar, and not how parsing actually works for the different types of rules and the different types of values. It doesn't unify the Chapter 4 grammar with how things actually get parsed, which I thought was one of the major goals of rewriting how we specify parsing.) For example, Tab's spec produces incorrect results for the following: @media screen { @foo bar } p { color: green } since it produces the data structure shown by the indentation: @media screen { @foo bar } p { color: green } /* an invalid @media rule */ } /* implied by EOF */ instead of the data structure shown by the indentation: @media screen { @foo bar /* an invalid @media rule */ } p { color: green } as required by CSS 2.1 (although I admit CSS 2.1 is only clear for the case of an unknown at-keyword, but not for a known at-keyword with invalid syntax following it; e.g., s/@foo bar/@media print/), since http://dev.w3.org/csswg/css3-syntax/#at-rule-mode0 doesn't have any special handling for the } token conditional on whether the at-rule being parsed is nested inside another rule. (At the top level of a style sheet, on the other hand, @foo bar } p { color: green } would be a single at-rule.) > > That said, I think the problems with this approach don't show up > > much in the material currently specified; I think most of the > > problems appear when describing how to parse the syntax of all the > > property values, which is where the bulk of CSS parsing logic lives. > > Hmm... I am intrigued. Do you have specific examples? For example, the entirety of parsing CSS media queries in @media rules happens inside the # anything else # consume a primitive and append the returned value to the # prelude of the current rule. Remain in this mode. in http://dev.w3.org/csswg/css3-syntax/#at-rule-mode0 . So does the entirety of parsing media queries in @import rules. So does the entirety of parsing the argument to an @supports rule. If how all of these things were parsed were actually defined in this specification, then it would be very easy to forget to mention that, when encountering an unexpected token (say, finding an opening '(' where there should have been a ':') in the middle of a media query, that one needs to reprocess the current token and then enter an error mode that involves matching a ). Or perhaps, if it's the containing mode that needs to do the () matching rather than the current mode, then the spec needs to say to return an error, return to the content mode, *and reconsume the current token* (which matters quite a bit if the unexpected token is a ')'). -David -- 𝄞 L. David Baron http://dbaron.org/ 𝄂 𝄢 Mozilla http://www.mozilla.org/ 𝄂
Received on Wednesday, 29 August 2012 01:03:51 UTC