- From: Bert Bos <bert@w3.org>
- Date: Thu, 3 Jul 2008 20:13:25 +0200
- To: Benjamin Hawkes-Lewis <bhawkeslewis@googlemail.com>
- Cc: W3C Emailing list for WWW Style <www-style@w3.org>
- Message-Id: <200807032013.25427.bert@w3.org>
On Wednesday 12 March 2008 23:03, Benjamin Hawkes-Lewis wrote: > The prose description of identifiers in the CSS 2.1 specification says: > > In CSS, identifiers (including element names, classes, and IDs in > > selectors) can contain only the characters [a-z0-9] and ISO 10646 > > characters U+00A1 and higher, plus the hyphen (-) and the > > underscore (_); they cannot start with a digit, or a hyphen > > followed by a digit. Identifiers can also contain escaped > > characters and any ISO 10646 character as a numeric code (see next > > item). For instance, the identifier "B&W?" may be written as > > "B\&W\?" or "B\26 W\3F". > > http://www.w3.org/TR/CSS21/syndata.html#value-def-identifier > > The next definition begins: > > In CSS 2.1, a backslash (\) character indicates three types of > > character escapes. > > It would have been helpful to this reader, at least, if it were > equally clear that the prose was talking only about identifiers in > CSS 2.1 not "CSS" generally, where according to the tokenization > rules identifiers may contain characters of octal 200 (U+0080) and > higher (i.e. a substantially wider set): > > http://www.w3.org/TR/CSS21/syndata.html#tokenization It seems you actually found an error in the text that nobody saw before, though the reason for the error is different from what you assumed. It's not a difference between generic CSS and CSS 2.1. The syntax of identifiers is meant to be the same in all levels. The generic syntax for CSS is trying to say that nothing outside the ASCII range is ever going to have a special function in CSS. All punctuation (such as curly braces and semicolons) is taken from the ASCII range. Section 4.1.1 says it in octal: nonascii is everything above 0177 (127 in decimal). Section 4.1.3 says it in hexadecimal: A1 and higher (161 in decimal). So are Unicode characters between 127 and 161 allowed or not? Well, when this text was first written, Unicode was still at version 1 and there *were* no characters between 127 and 160. The first actual non-ASCII character was at 160 (the non-breakable space). I think that's why section 4.1.3 says U+00A1 (161). It just tried to be helpful. Although I don't understand why it says A1 and not A0. The non-breakable space has no special function in CSS, so why exclude it? Unicode is now at version 5 and it filled the gap between 127 and 160 with actual characters. They are "control characters" like "cancel character" and "reverse line feed," i.e., not things that you can see or type in a typical editor, but a creative user could probably find a way to put them in a CSS file anyway. And thus section 4.1.3 needs to include them. So I think we need this fix: In section 4.1.3, second bullet, replace "U+00A1" by "U+0080". There is another point to your e-mail. You believed that there was a difference between generic CSS and CSS 2.1, because the third bullet point says "CSS 2.1" while the others say just "CSS." I can see that that is confusing. I think we, the editors, read these texts too often one line at a time, to see if that line on its own is correct. But if you read the lines in sequence, they indeed *suggest* that there is one rule for CSS 2.1 and another for CSS in general. That third bullet point is strictly speaking correct. The backslash works as described in CSS 2.1 and that's all that this spec needs to define. But it actually also works like that in other levels of CSS and it is less confusing if we say so. So I think we also should change: In section 4.1.3, third bullet, replace the first "CSS 2.1" by "CSS". For reference: this issue will be tracked as issue 57 at http://csswg.inkedblade.net/spec/css2.1#issue-57 Bert PS. I tested what browsers do, and, unfortunately, it seems that Firefox (version 2.0.0.14) does what 4.1.3 says: U+80 until U+00A0 cannot occur in identifiers. Opera and Konqueror do what 4.1.1 says: anything above U+7F can be in an identifier. Attached is a test case with a non-breakable space (U+A0) and another with a reverse line feed (U+8D). Let's hope they survive e-mail encoding and decoding... -- Bert Bos ( W 3 C ) http://www.w3.org/ http://www.w3.org/people/bos W3C/ERCIM bert@w3.org 2004 Rt des Lucioles / BP 93 +33 (0)4 92 38 76 92 06902 Sophia Antipolis Cedex, France
Attachments
Received on Thursday, 3 July 2008 18:14:39 UTC