[css3-syntax][css21] More problems with determining the character encoding

I started looking at changing Gecko to give precedence to the BOM for
text/css. I noticed further problems.

First of all, it appears that Gecko supports reading @charset that is
encoded as BOMless UTF-16. In that case, it makes no sense for the
stylesheet to declare an encoding other than the UTF-16 variant that
matches the endianness of the 0x00 bytes intertwined in the @charset
rule. However, Gecko seems to obey the declared encoding regardless of
what is declared. Shockingly, this behavior seems to be what CSS 2.1
calls for, even though the behavior doesn't really make sense.

Looking at http://www.w3.org/TR/CSS21/syndata.html#charset , it
supports UTF-32 (weird endianness permutations even), EBCDIC and GSM
03.38 byte patterns. (Have all those *really* been tested to have two
interoperable implementations for CSS 2.1?)

Additionally, CSS3 Syntax doesn't appear to mention the inheritance of
the encoding from the referring document in the absence of other
encoding information.

Please make the following changes to text/css (in addition to making
the BOM take the highest precedence):

 * Please prohibit authors from using and implementations from
supporting encodings that are not in the Encoding Standard.
(http://encoding.spec.whatwg.org/) If normatively referencing the
Encoding Standard is politically or procedurally infeasible, please at
least prohibit implementations from supporting non-ASCII-compatible
encodings other than variants of UTF-16. (See
http://www.w3.org/TR/html5/infrastructure.html#ascii-compatible-character-encoding
for a definition in the W3C space.) UTF-32, UTF-7, BOCU-1, SCSU,
variants of EBCDIC and GSM 03.38 should all be banned from being
supported by CSS implementations and from being used by CSS authors.

 * If there is no BOM, no @charset, no HTTP-level charset and no
charset attribute on the linking element, and the encoding of the
referring document or style sheet is ASCII-compatible, please define
that the encoding is inherited from the referrer. If the encoding of
the referrer is UTF-16, please define that the inherited encoding is
UTF-8.

 * Please make the encoding declared using @charset have no effect
unless the string "@charset" is represented as its ASCII bytes.

 * If it is determined that supporting BOMless UTF-16 that has
@charset is needed for Web compatibility, please base the sniffing on
the 0x00 bytes intertwined in "@charset" and not on whatever follows
"@charset". (Even better if support for BOMless UTF-16 can be
dropped.)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Monday, 22 October 2012 11:43:23 UTC