- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Tue, 17 Feb 2004 18:40:24 -0600
- To: Ian Hickson <ian@hixie.ch>
- Cc: Bert Bos <bert@w3.org>, www-style@w3.org
Ian Hickson wrote: > | * If the style sheet is encoded as "UTF-16" [RFC2781] or "UTF-32" > This doesn't conflict with the steps Bert mentioned, but it does clarify > that @charset is still relevant even if there is a BOM. But that a two-byte LE BOM followed by @charset "UTF-16BE"; would be treated as UTF-16LE? Or what? Perhaps I misunderstood Bert's proposed change? Can you clearly point to a situation where the two orders would give different results? Answers below apply to Mozilla's current implementation, not to any proposed or actual spec. > A. What happens if you have a UTF-16 BOM, and an @charset encoded as > UTF-16 which claims it is ISO-8859-1? We will end up treating the sheet as ISO-8859-1 at the moment. > B. Or no BOM, US-ASCII encoded @charset which claims to be UCS-4? Treat sheet as UCS-4. > C. Or UTF-8 BOM followed by US-ASCII @charset claiming ISO-8859-1? Treat sheet as ISO-8859-1 > D. A UTF-16 BOM with no @charset being linked from a stylesheet or > document that is known to be in UCS-2? Treat sheet as UTF-16. > E. a document whose odd bytes spell a US-ASCII @charset claiming UTF-16BE > and whose even bytes spell a US-ASCII @charset claiming UTF-16LE, linked > from a document or stylesheet claiming UTF-8? This would fail to parse as an @charset rule at all (it can't be UTF-16anything, since it has no null bytes there), and the sheet would be treated as UTF-8. -Boris
Received on Tuesday, 17 February 2004 19:40:30 UTC