- From: (unknown charset) Etan Wexler <ewexler@stickdog.com>
- Date: Fri, 28 Nov 2003 15:32:31 -0800
- To: (unknown charset) www-style@w3.org, Bert Bos <bert@w3.org>, Tantek Çelik <tantek@cs.stanford.edu>, Ian Hickson <ian@hixie.ch>, Håkon Wium Lie <howcome@opera.com>
Section 4.1.3 of the CSS 2.1 specification [D]: "In CSS 2.1, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [A-Za-z0-9] and ISO 10646 characters 161 and higher, plus the hyphen (-) and the underscore (_); they cannot start with a hyphen or a digit. They can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance, the identifier "B&W?" may be written as "B\&W\?" or "B\26 W\3F". "Note that Unicode is code-by-code equivalent to ISO 10646 (see [UNICODE] and [ISO10646])." I assume, because a comprehensive formal grammar is available nearby, that this passage is for primarily for non-technical readers. If so, there are a couple of problems. The construct "[A-Za-z0-9]" will look odd even if understood. The mention of "ISO 10646" is likely to mean nothing to many non-technical readers. ID selectors are HASH tokens. HASH tokens may have an unescaped digit immediately following the number sign, as in {#1name}. If the prose is not meant to constrain the syntax, the prose requires a change. If the prose is meant to constrain the syntax beyond what the grammar allows, I have to ask: what special meaning would the Working Group (now or in the future) like to assign to HASH tokens that start with digits and that are in selectors? If no distinct meaning is intended, there is no point in constraining the syntax. Referring to characters by decimal numbers is odd, given the prevalence and official use of hexadecimal notation. If intended as a sop to non-technical readers, I think that the effort is mostly wasted. When trying to discover what "characters 161 and higher" means in practical terms, non-technical readers will probably have to translate to hexadecimal numbers as they sift through, for example, the Unicode Consortium's documents. The exclusion of the C1 control characters is not reflected in the grammar. Which is authoritative: prose or grammar? How do existing implementations handle the presence of C1 control characters? The character called a hyphen in the passage is actually a hyphen-minus, U+002D. The true and unambiguous hyphen, U+2010, is distinct in the Universal Character Set. I mean, the One True Hyphen, praise be unto it. There are many digits which may start identifiers. Examples include Thai, Arabic, Devanagari... The only digits excluded are the ones present in ASCII, 0Ý9 (U+0030ÝU+0039). That identifiers can contain "any ISO 10646 character as a numeric code" is true only because ISO 10646 has not yet assigned characters beyond U+FFFFFF. From its inception, ISO 10646 has had a range of codepoints up to U+7FFFFFFF. Current CSS, with a limit of six hexadecimal digits in numeric escapes, can represent only codepoints U+0000ÝU+FFFFFF. It would be best to specify the range as such or to say that CSS can represent any Unicode character (which would be U+0000ÝU+10FFFF). "B&W?" is not an identifier. It is a sequence of characters that an identifier using proper mechanisms can represent. On the whole, the utility of the passage in its present state and position is dubious. If the Working Group feels a need to publish prose explaining the syntax, I would suggest that such prose reside in a non-normative appendix or even a separate Technical Report (such as a Note). Links from section 4 would be appropriate, of course. That said, the following revision attempts to make the best in-place change to the passage: "In CSS 2.1, identifiers (including element names and class names) can contain only AÝZ, aÝz, 0Ý9 , the hyphen-minus (U+002D), the underscore ("_", U+005F) and characters U+00A1ÝU+FFFFFD in the ISO 10646 standard [ISO10646]; they cannot start with a hyphen-minus or a digit 0Ý9. Escapes, whether in numeric or direct form, may appear anywhere in an identifier, including at the start. So the identifier corresponding to "B&W?" may be written as "B\&W\?" or "B\26 W\3F". "Note: The hyphen-minus (U+002D) is sometimes called a hyphen, a minus sign, or a dash. In the limited character set of ASCII, it had to fill all these roles. With the creation of the Unicode Standard [UNICODE] and ISO 10646 [ISO10646], there are now unambiguous coded characters for hyphen (U+2010), minus sign (U+2212) and various dashes including en dash (U+2013) and em dash (U+2014). The hyphen-minus cannot start an identifier, but the unambiguous characters can." [D] Bert Bos; and others. "Characters and case", section 4.1.3 of the CSS 2.1 specification. 15 September 2003. W3C Last Call Working Draft (work in progress). <http://www.w3.org/TR/2003/WD-CSS21-20030915/syndata.html#q6>. Û
Received on Friday, 28 November 2003 18:40:44 UTC