Re: [css3-syntax][css21] More problems with determining the character encoding

On Mon, Oct 22, 2012 at 7:42 PM, Henri Sivonen <hsivonen@iki.fi> wrote:

> I started looking at changing Gecko to give precedence to the BOM for
> text/css. I noticed further problems.
>
> First of all, it appears that Gecko supports reading @charset that is
> encoded as BOMless UTF-16. In that case, it makes no sense for the
> stylesheet to declare an encoding other than the UTF-16 variant that
> matches the endianness of the 0x00 bytes intertwined in the @charset
> rule. However, Gecko seems to obey the declared encoding regardless of
> what is declared. Shockingly, this behavior seems to be what CSS 2.1
> calls for, even though the behavior doesn't really make sense.
>
> Looking at http://www.w3.org/TR/CSS21/syndata.html#charset , it
> supports UTF-32 (weird endianness permutations even), EBCDIC and GSM
> 03.38 byte patterns. (Have all those *really* been tested to have two
> interoperable implementations for CSS 2.1?)
>

i'm sure the answer is *not*


>
> Additionally, CSS3 Syntax doesn't appear to mention the inheritance of
> the encoding from the referring document in the absence of other
> encoding information.
>

do you mean, e.g., using the charset param of the @type attr on a <style/> ?


>
> Please make the following changes to text/css (in addition to making
> the BOM take the highest precedence):
>
>  * Please prohibit authors from using and implementations from
> supporting encodings that are not in the Encoding Standard.
>

Can't prohibit author behavior. Can only say what to do if author does
something you don't like.


> (http://encoding.spec.whatwg.org/) If normatively referencing the
> Encoding Standard is politically or procedurally infeasible, please at
> least prohibit implementations from supporting non-ASCII-compatible
> encodings other than variants of UTF-16.


Can't prohibit implementations from supporting whatever they like.


> (See
>
> http://www.w3.org/TR/html5/infrastructure.html#ascii-compatible-character-encoding
> for a definition in the W3C space.) UTF-32, UTF-7, BOCU-1, SCSU,
> variants of EBCDIC and GSM 03.38 should all be banned from being
> supported by CSS implementations and from being used by CSS authors.
>

Can't ban author or implementation behavior. Can only define what to do
when behavior is conformant or not.


>
>  * If there is no BOM, no @charset, no HTTP-level charset and no
> charset attribute on the linking element, and the encoding of the
> referring document or style sheet is ASCII-compatible, please define
> that the encoding is inherited from the referrer. If the encoding of
> the referrer is UTF-16, please define that the inherited encoding is
> UTF-8.
>

That makes non sense There is no relationship between the encoding of a
referring document and a referenced document.


>
>  * Please make the encoding declared using @charset have no effect
> unless the string "@charset" is represented as its ASCII bytes.
>

If CSS2.1 already defines behavior for a BOMless interpretation of the
encoding of @charset that allows inferring encoding, then that definition
should be maintained, not removed.


>
>  * If it is determined that supporting BOMless UTF-16 that has
> @charset is needed for Web compatibility, please base the sniffing on
> the 0x00 bytes intertwined in "@charset" and not on whatever follows
> "@charset".


What is your rationale for this constraint?


> (Even better if support for BOMless UTF-16 can be
> dropped.)
>

Can't (shouldn't) do that if it is defined behavior.

Received on Monday, 22 October 2012 12:39:52 UTC