Re: [css-syntax] Dropping <number-token> representation, and its effects on <urange> from John Daggett on 2014-11-20 (www-style@w3.org from November 2014)

From: John Daggett <jdaggett@mozilla.com>
Date: Thu, 20 Nov 2014 05:24:21 -0800 (PST)
To: www-style list <www-style@w3.org>
Message-ID: <197152367.13484496.1416489861339.JavaMail.zimbra@mozilla.com>

Tab Atkins wrote:

>> I can't say that I *like* this, but that's because I am
>> philosophically not a fan of special tokenizer productions that
>> only apply in specific grammar contexts -- can anyone think of a
>> *practical* problem?  It's not any worse than unquoted url() in
>> terms of code, it can't change the boundaries of a top-level
>> construct, and the only other issue that comes to mind is that
>> it'll make it harder to use <unicode-range-token> somewhere else
>> in the future.  But I don't know that there *are* other uses, so.
> 
> That requires a vastly more complicated change, switching the
> Syntax module from being separate tokenizer/parser steps to being
> integrated, with a lot more state being thrown around.  And it
> doesn't help us if we ever want to use <urange> in another
> property or context, which I think is plausible.

Tab, the first line of your algorithm for handling <urange> sequences is [*]:

  1. Skipping the first u token, concatenate the representations of
     all the tokens in the production together (or, in the case of
     <dimension-token>s, the representation followed by the unit).
     Let this be text.

Let's not kid ourselves here, that's basically taking the token soup
that results from removing the UNICODE-RANGE token and says "take
these tokens and start over from scratch". Calling these "separate
tokenizer/parser steps" is basically bogus since your algorithm is
effectively re-tokenizing the sequence within the parser.

It would work just as well to say as part of selector parsing "if
you see a unicode-range token, convert it to text and use this
algorithm to come up with a selector". Both are hacks of equal standing,
you won't be winning any design contests with either.

I think if we were actually trying to create an accurate
representation of <urange> in a grammar form, it would look
something like:

  <urange> =
    ['u' | 'U'] '+' [ <hex-value> ['-' <hex value>]? ] |
                    [ <hex-value>? '?'+ ]

Here, <hex-value> would be a sequence of hexadecimal digits with the
appropriate restrictions on number of digits and value range
applied. I realize we don't have a clean way of representing
<hex-value> as a sequence of CSS tokens currently and so the need
for hacking.

The new syntax for <urange> in the Syntax spec now is an ugly change
but, meh, we can make it work.  

John Daggett

[*] http://dev.w3.org/csswg/css-syntax/#urange-syntax

Received on Thursday, 20 November 2014 13:24:49 UTC