CSS21 Syndata.html Section 4 from Tex Texin on 2003-10-15 (www-style@w3.org from October 2003)

From: Tex Texin <tex@i18nguy.com>
Date: Wed, 15 Oct 2003 17:39:29 -0400
To: www-style@w3.org
Cc: W3c I18n Group <w3c-i18n-ig@w3.org>
Message-ID: <3F8DBE91.7A3671E@i18nguy.com>
Sorry, this is after last call's deadline, but I have a couple comments on
section 4,
http://www.w3.org/TR/CSS21/syndata.html :

1) In section 4.1.3, in the third type of escape it says that if a character in
the range [0-9a-zA-Z] follows a hexadecimal number, the end of the number needs
to be made clear.

The range should be [0-9a-fA-F], since characters g-zG-Z are clearly beyond the
hexadecimal number.

2) Section 4.4 on CSS document representation
Mention should be made of the Unicode BOM and its relationship to the encoding
of the file.

Is BOM allowed?

The @charset rule is required to be the first character of the file. For some
people, it is confusing whether the BOM is considered a character. (To me it is
clear that it is not.)

Where does BOM fit in the precedence hierarchy?

What if the UTF-8 BOM exists, but the encoding is declared by @charset rule as
something else?
Is this an error, or is it considered that the encoding simply changes after
the @charset rule?

3) More generally, I find the character numbering a little confusing. It is in
fact self-consistent, but it is not obvious to the reader without some checking
and if you don't recognize the values.

If you take into account:

-In 4.1.1 Tokenization it is mentioned that Octal codes refer to ISO 10646.
(This is specific to Lex expressions but that is not immediately obvious. I
realize saying octal refers to 10646 doesn't mean that the reverse is true,
10646 codes will be in octal. But it doesn't mean it isn't true either.)

-The character codes at the end of 4.1.1 are decimal. Although one code (space)
is identified as a Unicode value.

-All of the escapes are of course hex.

Then, when you read in section 4.1.3 that ISO 10646 characters 161 and higher
are allowed in identifiers, it is not immediately obvious if 161 is octal,
decimal or hex.

It would perhaps be more clear to use the Unicode notation for character codes:
U+hhhh, as the hex notation is more common now and the U+ is distinctive and
indicative of the notation. The hex values are more recognizable as well, for
characters like ideographic space, etc.
and fit in better with the escape notation.

4) I am surprised that the on section URIs doesn't mention IRIs.

5) The section 4.3.7 on strings introduces \A for newline and points to an
example, so I assume there isn't a section describing other backslash codes
(e.g. \t etc.). However, the section doesn't define what the user should do if
they actually want a linefeed.

Is \0A (not \A) supposed to generate a linefeed or a newline? In other words,
is that string "\A" a special string, or is the character code U+000A mapped to
linefeed in css? The parenthetical remark seems to indicate that CSS redefines
the Unicode character.

(Which seems like a very odd and dangerous thing to do.)

Also I assume that \a and \A are equivalent. True?

tex



-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------
Received on Wednesday, 15 October 2003 17:40:32 UTC