- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Wed, 23 May 2012 17:16:10 -0700
- To: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>
- Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, WWW Style <www-style@w3.org>
On Mon, May 21, 2012 at 10:03 PM, Kang-Hao (Kenny) Lu <kennyluck@csail.mit.edu> wrote: > (12/05/22 9:14), Tab Atkins Jr. wrote: >> [snip some theory about whether or not we should change the core grammar] >> >> We should reject changes that would break non-trivial amounts of >> existing content. That's the only reasonable restriction that we can >> operate under; anything else would mean that we're promoting >> theoretical purity over improving the language for everyone else. > > While I more of less agree with the theory that changing for the better > is a good thing, in this particular case, I disagree with the idea that > putting $foo in the core grammar is actually "improving the language". > > In general, the effect of putting a prefixed identifier in the core > grammar is that every time a character is tokenized, the tokenizer has > to check to see if it is one of the prefixes and whether what follows is > an identifier. This would mean that for fallback tokens like DELIM (i.e. > ':', '{', '}', ';'), a redundant check to see if it is a '$' is needed. > IMHO, redundant checks are bad because, well, it's the user's computer > that runs this redundant check. Yes, it basically means adding an additional case to the data state <http://dev.w3.org/csswg/css3-syntax/#data-state>. If you don't handle that case in the tokenizer, you have to handle it in the parser, so I don't see how there's an efficiency penalty. I have no idea what you mean about fallback tokens. > (12/05/22 5:30), Tab Atkins Jr. wrote: >> Some further details - to handle $foo in the syntax, we'll either need >> to add a VAR token to the grammar (defined identically to HASH but >> with the $ character instead of #) > > Why identical to HASH but not ATKEYWORD? HASH needs {nmchar}+ becuase > <color> needs it. Otherwise, nowhere in CSS allows an identifier to > start with a number, including the ID selector: > > # A CSS ID selector contains a "#" immediately followed by the ID > # value, which must be an identifier. > > (though I think this prose is quite crappy again in that it sounds like > authoring conformance not UA conformance.) Largely because it's just plain simpler. ^_^ Dealing with the identifier rules is a pain in the ass. If you can just immediately switch into looking for nmchar, though, it's great! >> or accept that variables show up in the tokenizer as a $ DELIM >> followed by an IDENT. The latter is suboptimal, though - it allows >> comments between the $ and the foo, which sucks, > > Can you elaborate on why that sucks? Would anyone ever be confused by > this? It seems like a theoretical concern to me. Just because there's no reason a comment should go there. We should have been much more strict in where we allowed comments originally. If possible, I think we should engineer future things to avoid oddly placed comments. >> and it means we have to deal with the "first character of >> an IDENT" detail, despite there being no ambiguity (HASH gets to avoid >> all that and just use "nmchar+"). > > Can you elaborate? What is the "first character of IDENT" detail? What's > wrong by simply saying that $foo is "DELIM followed by an IDENT" (and > add a "without intermediate whitespace" to avoid confusion). > > I think HASH is a notorious example. Even if, for example, "#1st" is a > HASH, you still can't use it as a ID selector (tested with WebKit and > Firefox, not sure about others). The first character of an ident can only be a dash or a letter/non-ascii/escape/underscore. If it's a dash, the second character can only be a letter/non-ascii/escape/underscore; otherwise it can be a nmchar (number/letter/non-ascii-escape/dash/underscore). The rest of the characters can be nmchars. > (So, please consider this an errata item: > > In Appendix G, > > change > > # simple_selector > # : element_name [ HASH | class | attrib | pseudo ]* > # | [ HASH | class | attrib | pseudo ]+ > # ; > > to > > | /* > | * There is a constraint on the ID selector that the part after > | * "#" should match an IDENT; e.g., "#abc" is OK, but "#1st" is not. > | */ > | simple_selector > | : element_name [ HASH | class | attrib | pseudo ]* > | | [ HASH | class | attrib | pseudo ]+ > | ; > > like the comment above hexcolor. This should go into selector3 or 4 too.) That, or change the validity of id selectors, whichever is web-compatible. ~TJ
Received on Thursday, 24 May 2012 00:17:00 UTC