Re: [css2.1] [css3-fonts] Ambiguities relating to UNICODE-RANGE tokens from Bert Bos on 2008-11-06 (www-style@w3.org from November 2008)

From: Bert Bos <bert@w3.org>
Date: Thu, 6 Nov 2008 11:12:14 +0100
To: www-style@w3.org
Message-Id: <200811061112.14904.bert@w3.org>
On Thursday 25 September 2008 01:24, Zack Weinberg wrote:
> There are a number of ambiguities in the specification of
> unicode-range: descriptors and UNICODE-RANGE tokens.  Most are
> relevant only to css3-fonts, but two relate to the core syntax and
> are therefore relevant to css2.1 as well.
>
> The regular expression defining UNICODE-RANGE in CSS2.1 is
>
>   U\+[0-9a-f?]{1,6}(-[0-9a-f]{1,6})?
>
> Core syntax issue 1 (editorial, one hopes): The initial U is in upper
> case. All other core lexical productions are written entirely in
> lower case. 4.1.3 bullet point 1 assures us that CSS is entirely
> case- insensitive; I am assuming this is not a (unique) exception to
> that rule.  For consistency, the U should be changed to lower case. 
> If it *is* meant to be an exception, there should be explicit wording
> in both css-2.1 and css3-fonts that says so.

The U is uppercase only because that is how it usually written, e.g., 
U+0048 instead of u+0048; not because the lowercase is invalid. If that 
causes confusion, I'm happy to change the "U" to a "u" in the grammar. 
It is indeed purely editorial.

>
> Possible core syntax issue 2: This regular expression will match
> two classes of token which do not conform to any of the three
> basic forms called out in the current ED of css3-fonts:
>
>   U+1?10      question marks are not (all) trailing
>   U+A?-BF     both trailing question marks and a second endpoint
>
> I believe it is not possible to exclude all tokens in these classes,
> and still express all the existing constraints on UNICODE-RANGE
> tokens, using only Lex-style regular expression productions; in
> particular, it is not simultaneously possible to limit the first
> number to no more than 6 characters and specify that all question
> marks must trail.
>
> So I recommend that the core syntax be left alone here.  Instead,
> css3-fonts should say that any UNICODE-RANGE token that does not fit
> one of the three basic forms triggers a parse error (thus, the entire
> descriptor is discarded).
>
> [Aside: css3-fonts is almost entirely lacking in formal grammar
> rules. It would be nice if they got added.]

It didn't seem worth it to try and write a pattern that matches only 
those UNICODE_RANGE tokens that make sense. It may be possible, but the 
pattern would certainly be quite unreadable. So it was left to the text 
to explain that certain UNICODE_RANGE tokens are meaningless. That text 
was then left out of CSS 2.1, because UNICODE_RANGE is not used there.

How to handle those well-formed but meaningless tokens will indeed have 
to be explained in css3-fonts.

So I agree: there is something to do for css3-fonts[1], but nothing for 
CSS 2.1.

[1] http://dev.w3.org/csswg/css3-fonts/


> ----
[description of different cases omitted]

Makes sense. I'll leave it to the editors of the fonts module to 
suggest some text.

> ----
>
> There is also a question of what text is produced by a CSSOM query
> for the value of an arbitrary unicode-range: descriptor.  I recommend
> that implementations be allowed, but not required, to produce a
> simplified representation of the range instead of the original text. 
> Continuing with the example of
>
>    unicode-range: U+00??, U+0080-01FF;
>
> an implementation should be allowed to produce (at least) any of
> these:
>
>    U+00??, U+0080-01FF;      // exactly the original text
>    U+0000-00FF, U+0080-01FF; // question marks expanded to pairs
>    U+00??, U+01??;           // normalized to question mark form
>    U+0000-00FF, U+0100-01FF; // normalized to pair form
>    U+0000-01FF;              // optimized
>
> I don't think the spec needs to enumerate possibilities; just mention
> that implementations have license in this area.
>
> I would be happy to come up with wording for any or all of the above
> changes.

I have no preference. There is a section on normalization in the CSSOM 
and such a text could probably be added there. See
http://dev.w3.org/csswg/cssom/#parsing



Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos                               W3C/ERCIM
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France
Received on Thursday, 6 November 2008 10:12:55 UTC