- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Mon, 30 Jun 2014 14:26:28 -0700
- To: Boris Zbarsky <bzbarsky@mit.edu>
- Cc: Simon Sapin <simon.sapin@exyr.org>, www-style list <www-style@w3.org>
On Mon, Jun 30, 2014 at 9:23 AM, Boris Zbarsky <bzbarsky@mit.edu> wrote: > On 6/30/14, 11:12 AM, Simon Sapin wrote: >> >> On 30/06/14 15:34, Boris Zbarsky wrote: >>> >>> It seems to me like either we should not have a separate unicode-range >>> token and instead handle unicode ranges on the parser level or we should >>> have some sort of special token reprocessing logic in the selector >>> parser. My preference is very much for the former. >> >> >> I think we can do the former with a definition similar to this >> definition of <An+B> (the argument to :nth-child()) >> >> http://dev.w3.org/csswg/css-syntax/#the-anb-type >> >> It’s ugly, but it’s well-defined and it seems to be the "least worst" we >> can do here. > > I guess there is a third option too: tokenizer modes, such that u+a would be > tokenized differently in different contexts. I'm not sure how happy we are > with that idea. I'm not particularly happy with that idea; it requires either intertwining the tokenizer and parser, or maintaining the original text precisely enough during tokenization that it can be re-tokenized with a different tokenizer during parsing. I'm fine with dropping unicode-range as a token and just recognizing it specially like we do with <an+b>. It's a little complex, but no more so that an+b is. Philosophically, it occupies a similar space to an+b - it's a weird special-purpose token that is only used for one specific purpose, and is used in carefully controlled contexts (that is, it's not generally mixed in with a bunch of other tokens in the grammars where it's used). I prefer making these have an ugly token-based definition rather than continually running into these weird special cases that we didn't consider previously. ~TJ
Received on Monday, 30 June 2014 21:27:15 UTC