Re: [CSS21] BOM & @charset (issues 44 & 115) from Boris Zbarsky on 2004-02-18 (www-style@w3.org from February 2004)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Tue, 17 Feb 2004 18:40:24 -0600
To: Ian Hickson <ian@hixie.ch>
Cc: Bert Bos <bert@w3.org>, www-style@w3.org
Message-ID: <4032B478.10805@mit.edu>

Ian Hickson wrote:
> |     * If the style sheet is encoded as "UTF-16" [RFC2781] or "UTF-32"
> This doesn't conflict with the steps Bert mentioned, but it does clarify
> that @charset is still relevant even if there is a BOM.

But that a two-byte LE BOM followed by @charset "UTF-16BE"; would be 
treated as UTF-16LE?  Or what?  Perhaps I misunderstood Bert's proposed 
change?  Can you clearly point to a situation where the two orders would 
give different results?

Answers below apply to Mozilla's current implementation, not to any 
proposed or actual spec.

> A. What happens if you have a UTF-16 BOM, and an @charset encoded as
> UTF-16 which claims it is ISO-8859-1?

We will end up treating the sheet as ISO-8859-1 at the moment.

> B. Or no BOM, US-ASCII encoded @charset which claims to be UCS-4?

Treat sheet as UCS-4.

> C. Or UTF-8 BOM followed by US-ASCII @charset claiming ISO-8859-1?

Treat sheet as ISO-8859-1

> D. A UTF-16 BOM with no @charset being linked from a stylesheet or
> document that is known to be in UCS-2?

Treat sheet as UTF-16.

> E. a document whose odd bytes spell a US-ASCII @charset claiming UTF-16BE
> and whose even bytes spell a US-ASCII @charset claiming UTF-16LE, linked
> from a document or stylesheet claiming UTF-8?

This would fail to parse as an @charset rule at all (it can't be 
UTF-16anything, since it has no null bytes there), and the sheet would 
be treated as UTF-8.

-Boris

Received on Tuesday, 17 February 2004 19:40:30 UTC