Re: A few questions (possible typos?) re CSS2.1 4.1.1 - Tokenization and Appendix G from Bert Bos on 2006-09-27 (www-style@w3.org from September 2006)

From: Bert Bos <bert@w3.org>
Date: Wed, 27 Sep 2006 12:08:17 +0200
To: www-style@w3.org
Cc: Sergey Ignatchenko <sergey@ignatchenko.com>
Message-Id: <200609271208.18035.bert@w3.org>

On Friday 22 September 2006 01:33, Sergey Ignatchenko wrote:
> Sorry if asking about obvious things, but I have certain problems
> understanding the following aspects of CSS2.1 4.1.1 and Appendix G.
> Any advise will be appreciated.

The working group is reluctant to make changes at this late stage, even 
if they are just to clarify the text. The existing text, though 
sometimes difficult to read, is believed to be correct, while any 
"clarified" text might actually introduce mistakes.

That said, here are the responses in detail:

>
> 1. 4.1.1 says:
> "UNICODE-RANGE     U\+[0-9A-F?]{1,6}(-[0-9A-F]{1,6})?"
> should the first question mark really be here? what would be the
> meaning of construct "U+A?6??C"? Also the fate of UNICODE-RANGE is
> not clear at all; it doesn't seem to be mentioned anywhere else in
> the document (if it is a legacy from CSS1, which seems the most
> likely guess, it would be a good thing to clarify it's potential use
> or uselessness).

The UNICODE-RANGE token is used in the WebFonts module. WebFonts are no 
longer part of level 2, but they are still part of CSS. Chapter 4 is 
meant to define the syntax CSS for all times: different levels and 
different profiles may use different features of CSS, but the parsing 
rules are always the same.

So you are right that UNICODE-RANGE is not used in CSS 2.1, but we 
cannot remove the token.

You are also right that the UNICODE-RANGE token allows to express 
Unicode ranges that make no sense. E.g., "U+?0" is every 16th Latin-1 
character, i.e., [ 0@P`p°ÀÐàð]. But there is no harm in such 
expressions either and it makes parsing the token easier.

>
> 2. Both 4.1.1 and Appendix G.2 say that 2nd (supposedly unquoted?)
> form of URI is described as follows:
> "url\({w}([!#$%&*-~]|{nonascii}|{escape})*{w}\)"
> some set of characters seems to be missing here; for example, I don't
> see how example from 4.3.4
> ("url(http://www.example.com/redball.png)") can fit into this 
> description (as well as another description "url\({w}{string}{w}\)").

I admit that you have to read the definition very carefully, but it is 
actually correct. There is a "-" between the "*" and the "~" in the set 
of characters and that means that all characters with code points 
between "*" and "~" are included in the set. And that happens to cover 
nearly all ASCII characters.

>
> 3. In G.2 YACC/FLEX grammar 'S' is defined as a single space ("[
> \t\r\n\f]"), unlike 'S' in 4.1.1 ("[ \t\r\n\f]+"). While not a big
> deal, it looks a bit confusing.

As you said, neither of them is wrong. But we looked carefully at the 
grammars and decided to take the risk and change the token S in the 
appendix. In the next draft it will be "[ \t\r\n\f]+", just like in 
chapter 4.

For the CSS WG,
Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos                               W3C/ERCIM
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France

Received on Wednesday, 27 September 2006 10:08:33 UTC