- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Thu, 20 Nov 2014 08:38:26 -0800
- To: John Daggett <jdaggett@mozilla.com>
- Cc: www-style list <www-style@w3.org>
On Thu, Nov 20, 2014 at 5:24 AM, John Daggett <jdaggett@mozilla.com> wrote: > Tab Atkins wrote: >>> I can't say that I *like* this, but that's because I am >>> philosophically not a fan of special tokenizer productions that >>> only apply in specific grammar contexts -- can anyone think of a >>> *practical* problem? It's not any worse than unquoted url() in >>> terms of code, it can't change the boundaries of a top-level >>> construct, and the only other issue that comes to mind is that >>> it'll make it harder to use <unicode-range-token> somewhere else >>> in the future. But I don't know that there *are* other uses, so. >> >> That requires a vastly more complicated change, switching the >> Syntax module from being separate tokenizer/parser steps to being >> integrated, with a lot more state being thrown around. And it >> doesn't help us if we ever want to use <urange> in another >> property or context, which I think is plausible. > > Tab, the first line of your algorithm for handling <urange> sequences is [*]: > > 1. Skipping the first u token, concatenate the representations of > all the tokens in the production together (or, in the case of > <dimension-token>s, the representation followed by the unit). > Let this be text. > > Let's not kid ourselves here, that's basically taking the token soup > that results from removing the UNICODE-RANGE token and says "take > these tokens and start over from scratch". Calling these "separate > tokenizer/parser steps" is basically bogus since your algorithm is > effectively re-tokenizing the sequence within the parser. > > It would work just as well to say as part of selector parsing "if > you see a unicode-range token, convert it to text and use this > algorithm to come up with a selector". Both are hacks of equal standing, > you won't be winning any design contests with either. It's definitely arguable, but I don't think they're equal. In Selectors, the one token turns into three tokens, comprising pieces of two compound selectors and a combinator. That's really invasive from a grammar POV; it means I basically have to do a preprocessing step over the tokens before I can start actually matching a grammar against them. > I think if we were actually trying to create an accurate > representation of <urange> in a grammar form, it would look > something like: > > <urange> = > ['u' | 'U'] '+' [ <hex-value> ['-' <hex value>]? ] | > [ <hex-value>? '?'+ ] > > Here, <hex-value> would be a sequence of hexadecimal digits with the > appropriate restrictions on number of digits and value range > applied. I realize we don't have a clean way of representing > <hex-value> as a sequence of CSS tokens currently and so the need > for hacking. Yes, that's what we'd do if we were defining grammars over codepoints. But that's irrelevant, because we've lost the codepoints by the time we apply grammars. > The new syntax for <urange> in the Syntax spec now is an ugly change > but, meh, we can make it work. kk ~TJ
Received on Thursday, 20 November 2014 16:39:22 UTC