Re: [CSS21] BOM & @charset (issues 44 & 115) from Boris Zbarsky on 2004-02-18 (www-style@w3.org from February 2004)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Tue, 17 Feb 2004 18:57:16 -0600
To: Ian Hickson <ian@hixie.ch>
Cc: Bert Bos <bert@w3.org>, www-style@w3.org
Message-ID: <4032B86C.1080202@mit.edu>

Ian Hickson wrote:
>>But that a two-byte LE BOM followed by @charset "UTF-16BE"; would be
>>treated as UTF-16LE?
> 
> Yes.

But a two-byte LE BOM followed by @charset "Mysomething"; would be 
treated as "Mysomething"?  Or what?  I'm not sure what the impact of 
that "ignored for charsets other than UTF-16" language is...  especially 
in light of your examples.

> As far as I can determine, there was no such situation.

If there are no cases when two substantially different charsets can have 
the same BOM, there is indeed no such situations.  Are there indeed no 
such cases?  If there are not, then I agree that the order really 
doesn't matter here (and will happily got and remove some code, probably).

>>>A. What happens if you have a UTF-16 BOM, and an @charset encoded as
>>>UTF-16 which claims it is ISO-8859-1?
>>
>>We will end up treating the sheet as ISO-8859-1 at the moment.
> 
> I presume you agree that is suboptimal?

Frankly, if a sheet has an @charset rule that does not match the actual 
sheet data (or does it?) no matter what you do is suboptimal -- you have 
a good chance of getting it "wrong" either way.

Part of the issue here is that an @charset rule seems to have two things 
it does: 1) hint at the charset  2)  hint at how the sheet is to be 
serialized.   And it sounds like we want to handle cases where the first 
type of hint is wrong?

-Boris

Received on Tuesday, 17 February 2004 20:11:00 UTC