- From: Tex Texin <tex@i18nguy.com>
- Date: Wed, 15 Oct 2003 17:39:29 -0400
- To: www-style@w3.org
- Cc: W3c I18n Group <w3c-i18n-ig@w3.org>
Sorry, this is after last call's deadline, but I have a couple comments on section 4, http://www.w3.org/TR/CSS21/syndata.html : 1) In section 4.1.3, in the third type of escape it says that if a character in the range [0-9a-zA-Z] follows a hexadecimal number, the end of the number needs to be made clear. The range should be [0-9a-fA-F], since characters g-zG-Z are clearly beyond the hexadecimal number. 2) Section 4.4 on CSS document representation Mention should be made of the Unicode BOM and its relationship to the encoding of the file. Is BOM allowed? The @charset rule is required to be the first character of the file. For some people, it is confusing whether the BOM is considered a character. (To me it is clear that it is not.) Where does BOM fit in the precedence hierarchy? What if the UTF-8 BOM exists, but the encoding is declared by @charset rule as something else? Is this an error, or is it considered that the encoding simply changes after the @charset rule? 3) More generally, I find the character numbering a little confusing. It is in fact self-consistent, but it is not obvious to the reader without some checking and if you don't recognize the values. If you take into account: -In 4.1.1 Tokenization it is mentioned that Octal codes refer to ISO 10646. (This is specific to Lex expressions but that is not immediately obvious. I realize saying octal refers to 10646 doesn't mean that the reverse is true, 10646 codes will be in octal. But it doesn't mean it isn't true either.) -The character codes at the end of 4.1.1 are decimal. Although one code (space) is identified as a Unicode value. -All of the escapes are of course hex. Then, when you read in section 4.1.3 that ISO 10646 characters 161 and higher are allowed in identifiers, it is not immediately obvious if 161 is octal, decimal or hex. It would perhaps be more clear to use the Unicode notation for character codes: U+hhhh, as the hex notation is more common now and the U+ is distinctive and indicative of the notation. The hex values are more recognizable as well, for characters like ideographic space, etc. and fit in better with the escape notation. 4) I am surprised that the on section URIs doesn't mention IRIs. 5) The section 4.3.7 on strings introduces \A for newline and points to an example, so I assume there isn't a section describing other backslash codes (e.g. \t etc.). However, the section doesn't define what the user should do if they actually want a linefeed. Is \0A (not \A) supposed to generate a linefeed or a newline? In other words, is that string "\A" a special string, or is the character code U+000A mapped to linefeed in css? The parenthetical remark seems to indicate that CSS redefines the Unicode character. (Which seems like a very odd and dangerous thing to do.) Also I assume that \a and \A are equivalent. True? tex -- ------------------------------------------------------------- Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com Xen Master http://www.i18nGuy.com XenCraft http://www.XenCraft.com Making e-Business Work Around the World -------------------------------------------------------------
Received on Wednesday, 15 October 2003 17:40:32 UTC