Re: [css-syntax] Removed <unicode-range-token>, please review from Tab Atkins Jr. on 2014-11-14 (www-style@w3.org from November 2014)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Thu, 13 Nov 2014 17:39:21 -0800
To: "L. David Baron" <dbaron@dbaron.org>
Cc: www-style list <www-style@w3.org>
Message-ID: <CAAWBYDCmg4X_uw7zEvKGUwqc4+ZLe3fxiasTzyXOOJbwP6DSpg@mail.gmail.com>

On Thu, Nov 13, 2014 at 5:27 PM, L. David Baron <dbaron@dbaron.org> wrote:
> On Thursday 2014-11-13 17:13 -0800, Tab Atkins Jr. wrote:
>> Per the resolution from the 2014-07-02 telcon, I've removed the
>> <unicode-range-token> from Syntax entirely, and replaced it with a
>> <urange> microsyntax: <http://dev.w3.org/csswg/css-syntax/#urange>
>>
>> If you're interested in this kind of thing, please give it a look-over
>> and verify that I haven't missed any cases or made any mistakes.  It
>> was simpler than I thought it would be to spec out.
>>
>> One significant change is that the <urange> production is much looser
>> than the <unicode-range-token> parsing previously defined.  <urange>
>> does not attempt to ensure that the refs have at most 6 digits (or 6
>> total digits + question marks), as that would have made the speccing
>> and implementation much more difficult.  While I was against the
>> looser definition when it was a token, as a microsyntax (which is only
>> recognized when it's specifically called for) I'm fine with it being a
>> little loose.  This has no effect on its use in practice; it just
>> means that you can write things like U+0000000 (7 digits) that weren't
>> previously allowed.
>
> I'd prefer to leave which values are valid syntax and which aren't
> the way they are; I don't see the point in introducing compatibility
> risk without a good reason.  Unless, that is, implementation
> behavior doesn't actually match the current spec.

I can tighten it, it just makes things *significantly* more complex,
both in the spec and for implementations.  For example, you'd have to
count the length of the <<urange-codepoint>> before you knew how many
? characters were allowed to follow it.  It's just a lot of counting,
basically, for very little benefit.

I think implementations generally follow the old 2.1 tokenization
rules, which allowed things like U+1?2.  However, we're right now
switching our parser over to being based on Syntax, and we'll be
matching the Syntax spec there; we don't anticipate having compat
issues with the change.

~TJ

Received on Friday, 14 November 2014 01:40:08 UTC