- From: Bert Bos <bert@w3.org>
- Date: Thu, 6 Nov 2008 11:12:14 +0100
- To: www-style@w3.org
On Thursday 25 September 2008 01:24, Zack Weinberg wrote: > There are a number of ambiguities in the specification of > unicode-range: descriptors and UNICODE-RANGE tokens. Most are > relevant only to css3-fonts, but two relate to the core syntax and > are therefore relevant to css2.1 as well. > > The regular expression defining UNICODE-RANGE in CSS2.1 is > > U\+[0-9a-f?]{1,6}(-[0-9a-f]{1,6})? > > Core syntax issue 1 (editorial, one hopes): The initial U is in upper > case. All other core lexical productions are written entirely in > lower case. 4.1.3 bullet point 1 assures us that CSS is entirely > case- insensitive; I am assuming this is not a (unique) exception to > that rule. For consistency, the U should be changed to lower case. > If it *is* meant to be an exception, there should be explicit wording > in both css-2.1 and css3-fonts that says so. The U is uppercase only because that is how it usually written, e.g., U+0048 instead of u+0048; not because the lowercase is invalid. If that causes confusion, I'm happy to change the "U" to a "u" in the grammar. It is indeed purely editorial. > > Possible core syntax issue 2: This regular expression will match > two classes of token which do not conform to any of the three > basic forms called out in the current ED of css3-fonts: > > U+1?10 question marks are not (all) trailing > U+A?-BF both trailing question marks and a second endpoint > > I believe it is not possible to exclude all tokens in these classes, > and still express all the existing constraints on UNICODE-RANGE > tokens, using only Lex-style regular expression productions; in > particular, it is not simultaneously possible to limit the first > number to no more than 6 characters and specify that all question > marks must trail. > > So I recommend that the core syntax be left alone here. Instead, > css3-fonts should say that any UNICODE-RANGE token that does not fit > one of the three basic forms triggers a parse error (thus, the entire > descriptor is discarded). > > [Aside: css3-fonts is almost entirely lacking in formal grammar > rules. It would be nice if they got added.] It didn't seem worth it to try and write a pattern that matches only those UNICODE_RANGE tokens that make sense. It may be possible, but the pattern would certainly be quite unreadable. So it was left to the text to explain that certain UNICODE_RANGE tokens are meaningless. That text was then left out of CSS 2.1, because UNICODE_RANGE is not used there. How to handle those well-formed but meaningless tokens will indeed have to be explained in css3-fonts. So I agree: there is something to do for css3-fonts[1], but nothing for CSS 2.1. [1] http://dev.w3.org/csswg/css3-fonts/ > ---- [description of different cases omitted] Makes sense. I'll leave it to the editors of the fonts module to suggest some text. > ---- > > There is also a question of what text is produced by a CSSOM query > for the value of an arbitrary unicode-range: descriptor. I recommend > that implementations be allowed, but not required, to produce a > simplified representation of the range instead of the original text. > Continuing with the example of > > unicode-range: U+00??, U+0080-01FF; > > an implementation should be allowed to produce (at least) any of > these: > > U+00??, U+0080-01FF; // exactly the original text > U+0000-00FF, U+0080-01FF; // question marks expanded to pairs > U+00??, U+01??; // normalized to question mark form > U+0000-00FF, U+0100-01FF; // normalized to pair form > U+0000-01FF; // optimized > > I don't think the spec needs to enumerate possibilities; just mention > that implementations have license in this area. > > I would be happy to come up with wording for any or all of the above > changes. I have no preference. There is a section on normalization in the CSSOM and such a text could probably be added there. See http://dev.w3.org/csswg/cssom/#parsing Bert -- Bert Bos ( W 3 C ) http://www.w3.org/ http://www.w3.org/people/bos W3C/ERCIM bert@w3.org 2004 Rt des Lucioles / BP 93 +33 (0)4 92 38 76 92 06902 Sophia Antipolis Cedex, France
Received on Thursday, 6 November 2008 10:12:55 UTC