- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Tue, 28 Aug 2012 23:24:36 -0700
- To: "L. David Baron" <dbaron@dbaron.org>
- Cc: "Kang-Hao (Kenny) Lu" <kanghaol@oupeng.com>, WWW Style <www-style@w3.org>
On Tue, Aug 28, 2012 at 6:03 PM, L. David Baron <dbaron@dbaron.org> wrote: > On Wednesday 2012-08-29 07:59 +0800, Kang-Hao (Kenny) Lu wrote: >> (12/08/29 5:28), L. David Baron wrote: >> > As I said at the face-to-face meeting, I think the approach that the >> > specification takes to CSS's (), [], and {} matching rules is going >> > in the wrong direction. I think the normative specification text >> > for these should be the general statements about how the processing >> > works, and not the code-like form of the current specification, >> > since I really *don't* want bugs in the specification that break the >> > general rules to end up being codified in the specification. I >> > worry that ending up with exceptions to these rules could prevent us >> > from making general improvements to parsing technology that would >> > otherwise (without exceptions) be possible. (For example, we might >> > at some point in the future have generated parsers based on two >> > different but parallel state machines, one describing the correct >> > syntax and another describing the error handling behavior (for when >> > the first state machine goes into a failure state) -- done in a way >> > that a state in the error handling state machine can be determined >> > at parser generation time from the state in the correct-input state >> > machine.) >> >> It's a bit difficult to me to understand this concern without specific >> examples of "bugs in the specification that break the general rules". Do >> you mean, for example, error handling for BAD_URI should be undefined? > > No, I mean that that it's very easy to forget to write that it's > time to look for a matching closing ), or closing }, when handling > error cases. (Example below, after your next quote.) > > Or, in Tab's spec that accepts almost anything without going into an > error mode, it's easy to forget to put in special handling for a > closing } or ). (Note that this model of accepting almost anything is > the opposite of the way parsing works in Gecko. It also means that > Tab's spec so far really only describes the Chapter 4 grammar, and not > how parsing actually works for the different types of rules and the > different types of values. It doesn't unify the Chapter 4 grammar with > how things actually get parsed, which I thought was one of the major > goals of rewriting how we specify parsing.) I haven't decided quite how to spec the handling of actual property/rule grammars. I'm thinking it should be handled at two places: 1. When spotting an at-rule token or the name of a property, if it's unrecognized, enter the appropriate error mode. 2. When completing a declaration or at-rule, check its value against its grammar. If it's invalid, throw it out. Reasonable? Or should we do something else? > For example, Tab's spec produces incorrect results for the > following: > @media screen { @foo bar } p { color: green } > since it produces the data structure shown by the indentation: > @media screen { > @foo bar } p { color: green } /* an invalid @media rule */ > } /* implied by EOF */ > instead of the data structure shown by the indentation: > @media screen { > @foo bar /* an invalid @media rule */ > } > p { color: green } > as required by CSS 2.1 (although I admit CSS 2.1 is only clear for > the case of an unknown at-keyword, but not for a known at-keyword > with invalid syntax following it; e.g., s/@foo bar/@media print/), > since http://dev.w3.org/csswg/css3-syntax/#at-rule-mode0 doesn't > have any special handling for the } token conditional on whether the > at-rule being parsed is nested inside another rule. (At the top > level of a style sheet, on the other hand, > @foo bar } p { color: green } > would be a single at-rule.) Note that the specced behavior is what WebKit does. ^_^ There are a few ways I could fix it. If I add the above error-handling rules for encountering an unknown at-rule, I can immediately switch into an error-handling mode that handles that properly. (I had the sketch of this mode up before, but I ended up not needing it. I can add it back.) >> > That said, I think the problems with this approach don't show up >> > much in the material currently specified; I think most of the >> > problems appear when describing how to parse the syntax of all the >> > property values, which is where the bulk of CSS parsing logic lives. >> >> Hmm... I am intrigued. Do you have specific examples? > > For example, the entirety of parsing CSS media queries in @media > rules happens inside the > # anything else > # consume a primitive and append the returned value to the > # prelude of the current rule. Remain in this mode. > in http://dev.w3.org/csswg/css3-syntax/#at-rule-mode0 . So does the > entirety of parsing media queries in @import rules. So does the > entirety of parsing the argument to an @supports rule. > > If how all of these things were parsed were actually defined in this > specification, then it would be very easy to forget to mention that, > when encountering an unexpected token (say, finding an opening '(' > where there should have been a ':') in the middle of a media query, > that one needs to reprocess the current token and then enter an > error mode that involves matching a ). Or perhaps, if it's the > containing mode that needs to do the () matching rather than the > current mode, then the spec needs to say to return an error, return > to the content mode, *and reconsume the current token* (which > matters quite a bit if the unexpected token is a ')'). I'm also wondering if it might not be better to write the parser solely as functions, a la the way "parse a simple block" or "parse a function" are written. That might make some of the error-handling easier to specify, and would almost certainly make it simpler to specify the parsing of cssText on various OM rules. ~TJ
Received on Wednesday, 29 August 2012 06:28:14 UTC