- From: L. David Baron <dbaron@dbaron.org>
- Date: Tue, 28 Aug 2012 14:28:18 -0700
- To: www-style@w3.org
Some thoughts on the css3-syntax draft at http://dev.w3.org/csswg/css3-syntax/ follow, both on the general approach and on the lists of changes. I haven't read the state machine in detail. As I said at the face-to-face meeting, I think the approach that the specification takes to CSS's (), [], and {} matching rules is going in the wrong direction. I think the normative specification text for these should be the general statements about how the processing works, and not the code-like form of the current specification, since I really *don't* want bugs in the specification that break the general rules to end up being codified in the specification. I worry that ending up with exceptions to these rules could prevent us from making general improvements to parsing technology that would otherwise (without exceptions) be possible. (For example, we might at some point in the future have generated parsers based on two different but parallel state machines, one describing the correct syntax and another describing the error handling behavior (for when the first state machine goes into a failure state) -- done in a way that a state in the error handling state machine can be determined at parser generation time from the state in the correct-input state machine.) That said, I think the problems with this approach don't show up much in the material currently specified; I think most of the problems appear when describing how to parse the syntax of all the property values, which is where the bulk of CSS parsing logic lives. It's not clear to me whether http://dev.w3.org/csswg/css3-syntax/#declaration-value-mode0 is the extent of your plans for specifying how to parse CSS values or whether you're planning to actually specify value parsing in a similar way to the rest of the specification. I also think this sort of specification describing a state machine in prose is generally far less readable than a specification that describes a tokenization and grammar in a concise format. I think the special case of HTML parsing (which has so many complex rules that it can't reasonably be written in a concise format) doesn't mean that all other languages should be described in the same prose style. Yes, CSS 2.1's description of parsing is not as precise as it should be, but I'm not at all convinced that the fix to that problem needs to be as drastic as switching to a state machine written in prose. Some specific comments on "3.5 Changes from the CSS 2.1 Tokenizer": ================================================================== # 1. The DASHMATCH and INCLUDES tokens have been removed. They can # instead be handled simply by having them parse as DELIM tokens. # It was weird to privilege just those two types of attribute # equality operators, when Selectors 3 adds several more. I think this is a mistake. In Gecko we treat these, and all the new selectors introduced in css3-selectors, as tokens. In particular, not treating DASHMATCH as a separate token type makes the rules for parsing namespaces in attribute selectors extremely complicated; with DASHMATCH as a separate token it's trivial to implement correctly. I'd strongly prefer to leave these as distinct tokens and make new ones for the new selectors. # 2. The BAD-URI token (now bad-url) is "self-contained". In other # words, once the tokenizer realizes it's in a bad-url rather than # a url token, it just seeks forward to look for the closing ), # ignoring everything else. This behavior is simpler than treating # it like a FUNCTION token and paying attention to opened blocks # and such. Only WebKit exhibits this behavior, but it doesn't # appear that we've gotten any compat bugs from it. So if I'm understanding this correctly, this is more than the change we already made for issue 129 that's described in https://bugzilla.mozilla.org/show_bug.cgi?id=569646 . You're saying that not only do we ignore [] and {} that are prior to the point at which the URL is known to be invalid, but that you also ignore [] and {} that are *after* that point, until you reach a closing )? I guess this change seems reasonable to me. Some specific comments on "3.7. Changes from CSS 2.1 Core Grammar": ================================================================== # 1. No whitespace or comments are allowed between the DELIM(!) # and IDENT(important) tokens when processing an !important # directive at the end of a style rule. I disagree with this change; I think disallowing whitespace is a significant compatibility problem. There are a significant number of uses with whitespace that people have written in Gecko's codebase (including the only use of !important in a code example in our userContent-example.css that explains how to write user style sheets). Three of the examples in the cascading chapter of CSS 2.1 also use whitespace. # 2. The handling of some miscellanous βspecialβ tokens (like an # unmatched } token) showing up in various places in the grammar # has been specified with some reasonable behavior shown by at # least one browser. Previously, stylesheets with those tokens in # those places just didn't match the stylesheet grammar at all, so # their handling was totally undefined. I'm hoping that you defined unmatched } ) and ] to behave like any other incorrect token at that spot would behave. Is that the case? What other cases are there? # 3. Quirks mode parsing differences are now officially # recognized in the parser. I think these quirks should be described in terms of the value grammar of the properties rather than a token postprocessing step. While the behavior isn't distingishable in current implementations, variables make it distinguishable. I believe implementations implement it as a change to the grammar of the properties; at the very least, Gecko does. Describing it the way you do would require implementations to completely reimplement these quirks when they implement variables (so that they reject quirky values that were inserted by variable substitution). (Or do other implementations actually implement it as a token processing step?) -David -- π L. David Baron http://dbaron.org/ π π’ Mozilla http://www.mozilla.org/ π
Received on Tuesday, 28 August 2012 21:28:41 UTC