Re: [css-syntax] Ready for wide review, FPWD request coming soon from Tab Atkins Jr. on 2013-05-19 (www-style@w3.org from May 2013)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Sun, 19 May 2013 01:46:02 -0700
To: Zack Weinberg <zackw@panix.com>
Cc: www-style list <www-style@w3.org>
Message-ID: <CAAWBYDCmw8+KVs2Mhfw=VLKcaPPMFREWN-RkkF9qWTLYaDNKCQ@mail.gmail.com>

On Sat, May 18, 2013 at 6:27 PM, Zack Weinberg <zackw@panix.com> wrote:
> On 2013-05-17 4:16 PM, Tab Atkins Jr. wrote:
>>
>> On Fri, May 17, 2013 at 11:12 AM, Zack Weinberg <zackw@panix.com> wrote:
>>> * 4. Unicode-range tokens may need a "valid" flag.  I need to
>>>    cross-check the code in Gecko against the algorithm in this spec
>>>    carefully, but the definition of UNICODE-RANGE in CSS2.1 included
>>>    several forms that were semantically invalid.
>>
>> The parser in Syntax ended up only accepting valid unicode ranges
>> (except that it does, technically, allow for ranges where the min is
>> higher than the max).  This is more restrictive than CSS 2.1, but it
>> only fails to cover things that were invalid in the first place.
>
> I will pay careful attention to this section when I go back through.

Just to be totally clear, I think it's obvious that the definition of
UNICODE-RANGE in 2.1 was simply due to someone being lazy with the
regex:

  u\+[0-9a-f?]{1,6}(-[0-9a-f]{1,6})?

This allows nonsensical things like "u+??a0" or "u+00?-500".  It could
easily have been written exhaustively to be correct, if a bit long:

  u\+((\?{1,6})|([0-9a-f]\?{0,5})|([0-9a-f]{2}\?{0,4})|([0-9a-f]{3}\?{0,3})|([0-9a-f]{4}\?{0,2})|([0-9a-f]{5}\?)|([0-9a-f]{6}))|([0-9a-f]{1,6}-[0-9a-f]{1,6})

Like I said, verbose but easy.  (I can't read that text right now to
figure out if I got my parens right.  You get my meaning, though.)

Further, CSS2 didn't even define what the invalid syntaxes meant, or
how to treat them.  Fonts 3 does, but it merely considers them
invalid, which would still be the case in the current Syntax handling,
since they'd end up as a combination of idents and numbers and delims.
 unicode-ranges are already invalid everywhere outside of
'unicode-range', so there's also no risk of behavior change there.

So, Syntax's behavior is just being less lazy with the recognition of
valid tokens, but will, if I'm reasoning correctly, not even have a
theoretical effect on the behavior of current pages.

~TJ

Received on Sunday, 19 May 2013 08:46:52 UTC