Re: [css21][css3-syntax] $foo in the core grammar (was: [css-variables] Using $foo as the syntax for variables) from Tab Atkins Jr. on 2012-05-24 (www-style@w3.org from May 2012)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Wed, 23 May 2012 17:16:10 -0700
To: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>
Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, WWW Style <www-style@w3.org>
Message-ID: <CAAWBYDDb9Gb0WR43N=roefgRH7fBek4afswVsvbZNKggotFoAA@mail.gmail.com>

On Mon, May 21, 2012 at 10:03 PM, Kang-Hao (Kenny) Lu
<kennyluck@csail.mit.edu> wrote:
> (12/05/22 9:14), Tab Atkins Jr. wrote:
>> [snip some theory about whether or not we should change the core grammar]
>>
>> We should reject changes that would break non-trivial amounts of
>> existing content.  That's the only reasonable restriction that we can
>> operate under; anything else would mean that we're promoting
>> theoretical purity over improving the language for everyone else.
>
> While I more of less agree with the theory that changing for the better
> is a good thing, in this particular case, I disagree with the idea that
> putting $foo in the core grammar is actually "improving the language".
>
> In general, the effect of putting a prefixed identifier in the core
> grammar is that every time a character is tokenized, the tokenizer has
> to check to see if it is one of the prefixes and whether what follows is
> an identifier. This would mean that for fallback tokens like DELIM (i.e.
> ':', '{', '}', ';'), a redundant check to see if it is a '$' is needed.
> IMHO, redundant checks are bad because, well, it's the user's computer
> that runs this redundant check.

Yes, it basically means adding an additional case to the data state
<http://dev.w3.org/csswg/css3-syntax/#data-state>.

If you don't handle that case in the tokenizer, you have to handle it
in the parser, so I don't see how there's an efficiency penalty.

I have no idea what you mean about fallback tokens.


> (12/05/22 5:30), Tab Atkins Jr. wrote:
>> Some further details - to handle $foo in the syntax, we'll either need
>> to add a VAR token to the grammar (defined identically to HASH but
>> with the $ character instead of #)
>
> Why identical to HASH but not ATKEYWORD? HASH needs {nmchar}+ becuase
> <color> needs it. Otherwise, nowhere in CSS allows an identifier to
> start with a number, including the ID selector:
>
>  # A CSS ID selector contains a "#" immediately followed by the ID
>  # value, which must be an identifier.
>
> (though I think this prose is quite crappy again in that it sounds like
> authoring conformance not UA conformance.)

Largely because it's just plain simpler. ^_^  Dealing with the
identifier rules is a pain in the ass.  If you can just immediately
switch into looking for nmchar, though, it's great!


>> or accept that variables show up in the tokenizer as a $ DELIM
>> followed by an IDENT.  The latter is suboptimal, though - it allows
>> comments between the $ and the foo, which sucks,
>
> Can you elaborate on why that sucks? Would anyone ever be confused by
> this? It seems like a theoretical concern to me.

Just because there's no reason a comment should go there.  We should
have been much more strict in where we allowed comments originally.
If possible, I think we should engineer future things to avoid oddly
placed comments.


>> and it means we have to deal with the "first character of
>> an IDENT" detail, despite there being no ambiguity (HASH gets to avoid
>> all that and just use "nmchar+").
>
> Can you elaborate? What is the "first character of IDENT" detail? What's
> wrong by simply saying that $foo is "DELIM followed by an IDENT" (and
> add a "without intermediate whitespace" to avoid confusion).
>
> I think HASH is a notorious example. Even if, for example, "#1st" is a
> HASH, you still can't use it as a ID selector (tested with WebKit and
> Firefox, not sure about others).

The first character of an ident can only be a dash or a
letter/non-ascii/escape/underscore.  If it's a dash, the second
character can only be a letter/non-ascii/escape/underscore; otherwise
it can be a nmchar (number/letter/non-ascii-escape/dash/underscore).
The rest of the characters can be nmchars.



> (So, please consider this an errata item:
>
> In Appendix G,
>
> change
>
>  # simple_selector
>  #  : element_name [ HASH | class | attrib | pseudo ]*
>  #  | [ HASH | class | attrib | pseudo ]+
>  #  ;
>
> to
>
>  |  /*
>  | * There is a constraint on the ID selector that the part after
>  | * "#" should match an IDENT; e.g., "#abc" is OK, but "#1st" is not.
>  | */
>  | simple_selector
>  |  : element_name [ HASH | class | attrib | pseudo ]*
>  |  | [ HASH | class | attrib | pseudo ]+
>  |  ;
>
> like the comment above hexcolor. This should go into selector3 or 4 too.)

That, or change the validity of id selectors, whichever is web-compatible.

~TJ

Received on Thursday, 24 May 2012 00:17:00 UTC