Re: [css-syntax] Reverting <unicode-range> changes from CSS 2.1 from Tab Atkins Jr. on 2013-09-02 (www-style@w3.org from September 2013)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Mon, 2 Sep 2013 01:57:57 -0700
To: Simon Sapin <simon.sapin@exyr.org>
Cc: www-style <www-style@w3.org>, John Daggett <jdaggett@mozilla.com>
Message-ID: <CAAWBYDAM7d=g6VamkH2YVOmqAt0=2O8W-qvwhKwouNMAoi4uoQ@mail.gmail.com>

On Mon, Sep 2, 2013 at 1:52 AM, Simon Sapin <simon.sapin@exyr.org> wrote:
> Le 02/09/2013 08:57, Tab Atkins Jr. a écrit :
>> Yeah, as I said (though perhaps not clearly enough), I'm fine with
>> removing the additional checks that Syntax did to verify that the
>> token "made sense".  I'm okay with pushing at least that much to the
>> individual specs that use the token.  (Not happy, but okay with it.)
>>
>> What I'm against is forcing every use of the token to define how to
>> *parse* it, and reject nonsensical tokens like "U+1?5-300".  That
>> particular sequence of characters will *never* be a valid
>> unicode-range, no matter what we do, or what type of error-recovery a
>> particular property ends up wanting to define.
>>
>> In other words:
>>
>> * "U+9-1" is okay - let's keep that valid at the Syntax level, and let
>> Fonts deal with it as it wishes.
>> * "U+1?5" is not okay - let's reject that early, because we know for
>> certain that it's wrong.
>> * "U+???" should be transformed into "U+000-999" at the Syntax level,
>> because that's the way it'll *always* be interpreted, and we shouldn't
>> force every usage of the token to re-define how to parse a token.  We
>> should just ensure that every unicode-range is turned into a start
>> value and an optional end value, with both values being positive
>> integers.
>
>
> So, trying to interpret this, you’re proposing to keep "Consume a
> unicode-range token" as it was, but skip the "Set the unicode-range’s range"
> step. To token would have a start and an optional end that are both
> integers. (Or the end could be non-optional, and set to the start if not
> provided in the source.)
>
> Is this correct?

Yes.

> In the ED just before my edits:
>
> https://dvcs.w3.org/hg/csswg/raw-file/aa1b58939f73/css-syntax/Overview.html#consume-a-unicode-range-token
> https://dvcs.w3.org/hg/csswg/raw-file/aa1b58939f73/css-syntax/Overview.html#set-the-unicode-ranges-range
>
>
> If the token’s model is two integers, I think the Fonts spec should be
> changed to define its <urange> in terms of these integers. The current
> definition is based on text, so it’s more consistent with a token containing
> code points.

The Fonts spec was written against 2.1, which didn't throw out
obviously nonsensical unicode-range tokens, and didn't interpret the
tokens further, so the Fonts spec's current treatment makes sense.
That doesn't mean this is necessarily the best separation of work
between the specs.

>> Unless we think there's the faintest possibility of "U+1?5" ever being
>> considered valid, we should go ahead and do the parsing in the
>> *parsing* spec.  ^_^
>
> I don’t think everything parsing-related *has* to be in the Syntax spec. We
> already have lots parsing definitions in other specs for individual
> properties, Selectors, etc.

Right, but we do very little, if any, actual parsing *of the text of a
token* at the property level.  (Actually, I'm not sure we do any at
all.)

>> Still, though, that character pattern could show up in a base64 value
>> put directly in a custom property - if it was preceded by a delim
>> character, it'll parse correctly.  ^_^
>
> That’s a separate but interesting question. What can go wrong if authors
> expect random text to round-trip through Custom Properties parsing and
> serialization? (Not just with <unicode-range>.)

Quite a bit - for arbitrary text that doesn't necessarily conform to
CSS's idea of what constitutes tokens, it should really be stored as a
CSS string.

~TJ

Received on Monday, 2 September 2013 08:58:44 UTC