Selector parsing: It's easy to hit unexpected unicode-range tokens

This came up in https://bugzilla.mozilla.org/show_bug.cgi?id=1032034

Consider this rule:

   #nav u+a { background: yellow; }

meant to match this DOM:

   <nav id="nav">
     <u>THE U</u>
     <a href="#">THE A</a>
   </nav>

Per current syntax spec [1] this produces the following token stream:

   <hash-token> <unicode-range-token>

There is no real definition of selector parsing so far, but the grammars 
that do exist for CSS don't allow a unicode-range anywhere in there, so 
this is treated as an invalid selector in at least Firefox and Chrome 
(but not IE).

This seems like a pretty serious author footgun to me.  In particular, 
these selectors would fail to parse in the specs as they currently stand:

   #nav u+a
   #nav u+b
   #nav u+code
   #nav u+font
   #nav u+article

while these would work fine:

   #nav u+s
   #nav u+i
   #nav u+p

Of course inserting whitespace before or after the '+' will also make 
the selectors parse.  This is not a sensible behavior.  ;)

It seems to me like either we should not have a separate unicode-range 
token and instead handle unicode ranges on the parser level or we should 
have some sort of special token reprocessing logic in the selector 
parser.  My preference is very much for the former.

-Boris

[1] http://dev.w3.org/csswg/css-syntax/

  , this is an invalid selector, because the tokenizer sees "u+a", goes 
to consume a unicode-range token,

Received on Monday, 30 June 2014 14:34:59 UTC