Re: [css-syntax] Reverting <unicode-range> changes from CSS 2.1 from Simon Sapin on 2013-09-01 (www-style@w3.org from September 2013)

From: Simon Sapin <simon.sapin@exyr.org>
Date: Sun, 01 Sep 2013 22:44:44 +0100
To: "Tab Atkins Jr." <jackalmage@gmail.com>
CC: www-style <www-style@w3.org>, John Daggett <jdaggett@mozilla.com>
Message-ID: <5223B54C.9080709@exyr.org>

Le 01/09/2013 18:06, Tab Atkins Jr. a écrit :
> On Sun, Sep 1, 2013 at 8:43 AM, Simon Sapin <simon.sapin@exyr.org> wrote:
>> Hi,
>>
>> The Syntax ED had a some changes to the <unicode-range> token from CSS 2.1
>> where it tried to make the syntax closer to what @font-face accepts, parse a
>> numeric range as soon as in the tokenizer, and do some range normalization.
>>
>>
>> I just "reverted" these changes, and wrote a definition that matches CSS
>> 2.1’s idea of what <unicode-range> is:
>>
>> https://dvcs.w3.org/hg/csswg/rev/dec8752a6390#l3.1
>>
>> Instead of a numeric range, <unicode-range> tokens now have a "start" and an
>> optional "end", each made of one to six code points. Parsing this into a
>> numeric range is left to the Fonts spec (the only one where <unicode-range>
>> is ever valid,) which it already defines.
>>
>>
>> Reasoning:
>>
>> * This makes Syntax agnostic to the exact treatment of <unicode-range> in
>> the Fonts spec. For example, Fonts recently changed to make ranges ending
>> outside of Unicode invalid instead of clipping them. Compare:
>>
>> http://www.w3.org/TR/2013/WD-css3-fonts-20130212/#unicode-range-desc
>> http://www.w3.org/TR/2013/WD-css-fonts-3-20130711/#unicode-range-desc
>>
>> * Yes, CSS 2.1’s definition is silly and allows invalid ranges such as
>> U+1?5-300, but changing it does not buy us anything. It’s fine to have
>> absurd input be valid at the Syntax level and rejected by a given
>> property/descriptor/selector/etc.
>
> I don't like this in general, but I'm okay with the first bullet-point
> - if we, for whatever reason, use <unicode-range> again, it's vaguely
> plausible we might want different behavior for clipping vs invalid.
>
> The second bullet point just doesn't make any sense, though, unless
> we're worried about accidental usages of the token in generic contexts
> like custom properties.

I don’t think this affects custom properties, whose value is either 
(eventually) used in a non-custom property, or serialized back to a CSS 
string.


> I doubt we'll *ever* give "U+1?5-300" a valid
> meaning, because it's a nonsensical range.  As I argued previously at
> the face-to-face, the only reason that these silly kinds of ranges
> were *ever* valid is because someone valued terseness over accuracy
> when writing the regex - it's trivial to make a slightly longer regex
> that only matches the ranges with sensical syntax.

Actually, the Fonts spec now rejects many more corner cases than it used 
to (eg. decreasing range.) It’s much easier IMO to say "drop the 
declaration" than to try to encode all of these constraints in the 
tokenizer.

Yes, we could make the token definition more restrictive (and less 
silly) but I think that the added complexity does not buy us anything.


> By making Syntax "agnostic" about this, we end up requiring every
> usage of the token to repeat the exact same parsing/validation logic
> every time.  This is silly, when we can just bake that in once at the
> Syntax level,

No need to repeat. If we ever need ranges of code points again, the new 
feature can refer to the parsing defined in the Fonts spec. If 
appropriate, we could then move it to the Values & Units spec.


> unless we really do think accidental usages of that
> character pattern are something to worry about.

I’m not worried about that. All definitions of <unicode-range> we ever 
had require it to start with [uU]+[0-9a-fA-F?], which is pretty 
characteristic.

-- 
Simon Sapin

Received on Sunday, 1 September 2013 21:45:11 UTC