- From: Martin Duerst <duerst@w3.org>
- Date: Thu, 16 Oct 2003 21:18:05 -0400
- To: Tex Texin <tex@i18nguy.com>, www-style@w3.org
- Cc: W3c I18n Group <w3c-i18n-ig@w3.org>
At 17:39 03/10/15 -0400, Tex Texin wrote: >Sorry, this is after last call's deadline, but I have a couple comments on >section 4, >http://www.w3.org/TR/CSS21/syndata.html : >3) More generally, I find the character numbering a little confusing. It is in >fact self-consistent, but it is not obvious to the reader without some >checking >and if you don't recognize the values. This seems to be not only a little confusing, but baroque and out of date. I'm sure there are versions or equivalents to lex these days that can use something different than octal. And decimal is also way out of fashion these days in anything connected with character encoding, and Unciode in particular. I sincerely hope that this can be fixed. The readers will be grateful for a long time to come. Regards, Martin. >If you take into account: > >-In 4.1.1 Tokenization it is mentioned that Octal codes refer to ISO 10646. >(This is specific to Lex expressions but that is not immediately obvious. I >realize saying octal refers to 10646 doesn't mean that the reverse is true, >10646 codes will be in octal. But it doesn't mean it isn't true either.) > >-The character codes at the end of 4.1.1 are decimal. Although one code >(space) >is identified as a Unicode value. > >-All of the escapes are of course hex. > >Then, when you read in section 4.1.3 that ISO 10646 characters 161 and higher >are allowed in identifiers, it is not immediately obvious if 161 is octal, >decimal or hex. > >It would perhaps be more clear to use the Unicode notation for character >codes: >U+hhhh, as the hex notation is more common now and the U+ is distinctive and >indicative of the notation. The hex values are more recognizable as well, for >characters like ideographic space, etc. >and fit in better with the escape notation.
Received on Thursday, 16 October 2003 21:25:13 UTC