Re: [CSS21] BOM & @charset (issues 44 & 115)

On Tue, 17 Feb 2004, Boris Zbarsky wrote:
>
> But that a two-byte LE BOM followed by @charset "UTF-16BE"; would be
> treated as UTF-16LE?

Yes.


> Can you clearly point to a situation where the two orders would give
> different results?

As far as I can determine, there was no such situation. That is why the
change was made (the old order was semantically the same but didn't make
sense since logically you'd have to read the BOM first).


>> A. What happens if you have a UTF-16 BOM, and an @charset encoded as
>> UTF-16 which claims it is ISO-8859-1?
>
> We will end up treating the sheet as ISO-8859-1 at the moment.

I presume you agree that is suboptimal?


>> B. Or no BOM, US-ASCII encoded @charset which claims to be UCS-4?
>
> Treat sheet as UCS-4.

Ditto, although exactly what should happen here is hard to define without
the spec getting bogged down in detail (which I fear CSS3 will have to).


>> C. Or UTF-8 BOM followed by US-ASCII @charset claiming ISO-8859-1?
>
> Treat sheet as ISO-8859-1

There's probably about even odds on what the real encoding is, so that's
not a big deal probably.


>> D. A UTF-16 BOM with no @charset being linked from a stylesheet or
>> document that is known to be in UCS-2?
>
> Treat sheet as UTF-16.

Makes sense.


>> E. a document whose odd bytes spell a US-ASCII @charset claiming UTF-16BE
>> and whose even bytes spell a US-ASCII @charset claiming UTF-16LE, linked
>> from a document or stylesheet claiming UTF-8?
>
> This would fail to parse as an @charset rule at all (it can't be
> UTF-16anything, since it has no null bytes there), and the sheet would
> be treated as UTF-8.

Good good.

-- 
Ian Hickson                                      )\._.,--....,'``.    fL
U+1047E                                         /,   _.. \   _\  ;`._ ,.
http://index.hixie.ch/                         `._.-(,_..'--(,_..'`-.;.'

Received on Tuesday, 17 February 2004 19:48:53 UTC