Re: [css-syntax] <urange> and it's problems from fantasai on 2016-04-12 (www-style@w3.org from April 2016)

From: fantasai <fantasai.lists@inkedblade.net>
Date: Tue, 12 Apr 2016 17:27:46 -0400
To: "Tab Atkins Jr." <jackalmage@gmail.com>, www-style list <www-style@w3.org>
Message-ID: <570D6852.5010100@inkedblade.net>

On 04/12/2016 04:37 PM, Tab Atkins Jr. wrote:
> History: CSS2.1 defined a special grammar token just for unicode
> ranges, which was used in exactly one place: the 'unicode-range'
> descriptor of @font-face.  This special production caused bugs in
> pages, where selectors like `u+a { ... }` were parsed as a
> UNICODE-RANGE token, rather than the expected "IDENT(u) DELIM(+)
> IDENT(a)", like every other selector of that form was parsed.  (This
> isn't theoretical - Moz had a bug reported against it for this.)
>
> When writing the Syntax spec, I tried to fix this by dropping the
> unicode-range concept from the tokenizer, and instead handling it as a
> complex construct of the existing tokens, like I did with <an+b>.
> This kinda worked initially, but was *really* nasty.  Since then, we
> added scinot to numbers (like 1e3 for 1000), and this *completely
> destroyed* my ability to define <urange> cleanly - I can no longer use
> the value of numeric tokens, and instead have to rely on the
> "representation", which no browser stores or wants to store.
>
> I want to go ahead and resolve this.  I can see three options:
>
> 1. Keep what I'm currently doing.  This requires browsers to hold onto
> the string representation of numeric tokens (numbers and dimensions)
> at least through initial parsing (longer if they're used in a custom
> property).
>
> 2. Abandon this effort, go back to having a special unicode-range
> token. Accept that this is weird and there are stupid side-effects,
> like some selectors not working.
>
> 3. Define a new <urange> syntax that's actually simple to obtain from
> the existing tokens¹. Deprecate the old syntax; require UAs to accept
> the old syntax in the 'unicode-range' descriptor, but don't define how
> they should do so.  (Current UAs use context-sensitive retokenizing, I
> think - once they realize they're in a unicode-range descriptor,
> they'll retokenize the original text according to a special set of
> rules.)
>
> Thoughts?

Given unicode-range is already shipping
   http://caniuse.com/#feat=font-unicode-range
I think #3 is a non-starter.

I would imagine that reparsing unicode-range tokens in order to make
the selectors work would be easier than doing #1, no? Hanging onto
unicode-range tokens would be a lot less memory than hanging onto
numbers and dimensions, given they're used so rarely.

~fantasai

Received on Tuesday, 12 April 2016 21:28:19 UTC