- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Wed, 23 May 2012 17:16:10 -0700
- To: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>
- Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, WWW Style <www-style@w3.org>
On Mon, May 21, 2012 at 10:03 PM, Kang-Hao (Kenny) Lu
<kennyluck@csail.mit.edu> wrote:
> (12/05/22 9:14), Tab Atkins Jr. wrote:
>> [snip some theory about whether or not we should change the core grammar]
>>
>> We should reject changes that would break non-trivial amounts of
>> existing content. That's the only reasonable restriction that we can
>> operate under; anything else would mean that we're promoting
>> theoretical purity over improving the language for everyone else.
>
> While I more of less agree with the theory that changing for the better
> is a good thing, in this particular case, I disagree with the idea that
> putting $foo in the core grammar is actually "improving the language".
>
> In general, the effect of putting a prefixed identifier in the core
> grammar is that every time a character is tokenized, the tokenizer has
> to check to see if it is one of the prefixes and whether what follows is
> an identifier. This would mean that for fallback tokens like DELIM (i.e.
> ':', '{', '}', ';'), a redundant check to see if it is a '$' is needed.
> IMHO, redundant checks are bad because, well, it's the user's computer
> that runs this redundant check.
Yes, it basically means adding an additional case to the data state
<http://dev.w3.org/csswg/css3-syntax/#data-state>.
If you don't handle that case in the tokenizer, you have to handle it
in the parser, so I don't see how there's an efficiency penalty.
I have no idea what you mean about fallback tokens.
> (12/05/22 5:30), Tab Atkins Jr. wrote:
>> Some further details - to handle $foo in the syntax, we'll either need
>> to add a VAR token to the grammar (defined identically to HASH but
>> with the $ character instead of #)
>
> Why identical to HASH but not ATKEYWORD? HASH needs {nmchar}+ becuase
> <color> needs it. Otherwise, nowhere in CSS allows an identifier to
> start with a number, including the ID selector:
>
> # A CSS ID selector contains a "#" immediately followed by the ID
> # value, which must be an identifier.
>
> (though I think this prose is quite crappy again in that it sounds like
> authoring conformance not UA conformance.)
Largely because it's just plain simpler. ^_^ Dealing with the
identifier rules is a pain in the ass. If you can just immediately
switch into looking for nmchar, though, it's great!
>> or accept that variables show up in the tokenizer as a $ DELIM
>> followed by an IDENT. The latter is suboptimal, though - it allows
>> comments between the $ and the foo, which sucks,
>
> Can you elaborate on why that sucks? Would anyone ever be confused by
> this? It seems like a theoretical concern to me.
Just because there's no reason a comment should go there. We should
have been much more strict in where we allowed comments originally.
If possible, I think we should engineer future things to avoid oddly
placed comments.
>> and it means we have to deal with the "first character of
>> an IDENT" detail, despite there being no ambiguity (HASH gets to avoid
>> all that and just use "nmchar+").
>
> Can you elaborate? What is the "first character of IDENT" detail? What's
> wrong by simply saying that $foo is "DELIM followed by an IDENT" (and
> add a "without intermediate whitespace" to avoid confusion).
>
> I think HASH is a notorious example. Even if, for example, "#1st" is a
> HASH, you still can't use it as a ID selector (tested with WebKit and
> Firefox, not sure about others).
The first character of an ident can only be a dash or a
letter/non-ascii/escape/underscore. If it's a dash, the second
character can only be a letter/non-ascii/escape/underscore; otherwise
it can be a nmchar (number/letter/non-ascii-escape/dash/underscore).
The rest of the characters can be nmchars.
> (So, please consider this an errata item:
>
> In Appendix G,
>
> change
>
> # simple_selector
> # : element_name [ HASH | class | attrib | pseudo ]*
> # | [ HASH | class | attrib | pseudo ]+
> # ;
>
> to
>
> | /*
> | * There is a constraint on the ID selector that the part after
> | * "#" should match an IDENT; e.g., "#abc" is OK, but "#1st" is not.
> | */
> | simple_selector
> | : element_name [ HASH | class | attrib | pseudo ]*
> | | [ HASH | class | attrib | pseudo ]+
> | ;
>
> like the comment above hexcolor. This should go into selector3 or 4 too.)
That, or change the validity of id selectors, whichever is web-compatible.
~TJ
Received on Thursday, 24 May 2012 00:17:00 UTC