Re: [css-syntax] Reverting <unicode-range> changes from CSS 2.1

On Sun, Sep 1, 2013 at 8:43 AM, Simon Sapin <simon.sapin@exyr.org> wrote:
> Hi,
>
> The Syntax ED had a some changes to the <unicode-range> token from CSS 2.1
> where it tried to make the syntax closer to what @font-face accepts, parse a
> numeric range as soon as in the tokenizer, and do some range normalization.
>
>
> I just "reverted" these changes, and wrote a definition that matches CSS
> 2.1’s idea of what <unicode-range> is:
>
> https://dvcs.w3.org/hg/csswg/rev/dec8752a6390#l3.1
>
> Instead of a numeric range, <unicode-range> tokens now have a "start" and an
> optional "end", each made of one to six code points. Parsing this into a
> numeric range is left to the Fonts spec (the only one where <unicode-range>
> is ever valid,) which it already defines.
>
>
> Reasoning:
>
> * This makes Syntax agnostic to the exact treatment of <unicode-range> in
> the Fonts spec. For example, Fonts recently changed to make ranges ending
> outside of Unicode invalid instead of clipping them. Compare:
>
> http://www.w3.org/TR/2013/WD-css3-fonts-20130212/#unicode-range-desc
> http://www.w3.org/TR/2013/WD-css-fonts-3-20130711/#unicode-range-desc
>
> * Yes, CSS 2.1’s definition is silly and allows invalid ranges such as
> U+1?5-300, but changing it does not buy us anything. It’s fine to have
> absurd input be valid at the Syntax level and rejected by a given
> property/descriptor/selector/etc.

I don't like this in general, but I'm okay with the first bullet-point
- if we, for whatever reason, use <unicode-range> again, it's vaguely
plausible we might want different behavior for clipping vs invalid.

The second bullet point just doesn't make any sense, though, unless
we're worried about accidental usages of the token in generic contexts
like custom properties.  I doubt we'll *ever* give "U+1?5-300" a valid
meaning, because it's a nonsensical range.  As I argued previously at
the face-to-face, the only reason that these silly kinds of ranges
were *ever* valid is because someone valued terseness over accuracy
when writing the regex - it's trivial to make a slightly longer regex
that only matches the ranges with sensical syntax.

By making Syntax "agnostic" about this, we end up requiring every
usage of the token to repeat the exact same parsing/validation logic
every time.  This is silly, when we can just bake that in once at the
Syntax level, unless we really do think accidental usages of that
character pattern are something to worry about.

~TJ

Received on Sunday, 1 September 2013 17:07:27 UTC