- From: Ian Hickson <ian@hixie.ch>
- Date: Wed, 18 Feb 2004 00:24:50 +0000 (UTC)
- To: Boris Zbarsky <bzbarsky@MIT.EDU>
- Cc: Bert Bos <bert@w3.org>, www-style@w3.org
On Tue, 17 Feb 2004, Boris Zbarsky wrote: > > Sure. Different encodings can have the same BOM (eg UTF-16 and UCS-2, > but there may also be other cases that are not quite so trivial). This case is not a problem -- since UTF-16 is a superset of UCS-2, simply treat it as UTF-16. But in any case, the change Bert mentioned doesn't remove the previous text, which says: | If an external style sheet has U+FEFF ("zero width non-breaking space") | as the first character (i.e., even before any @charset rule), this | character is interpreted as a so-called "Byte Order Mark" (BOM), as | follows: | | * If the style sheet is encoded as "UTF-16" [RFC2781] or "UTF-32" | [UNICODE], the BOM determines the byte order (e.g. "big-endian" or | "little-endian") as explained in the cited RFC. | * If the style sheet is encoded as anything else, the U+FEFF | character is ignored. This doesn't conflict with the steps Bert mentioned, but it does clarify that @charset is still relevant even if there is a BOM. (The text "BOM" in the steps links straight to this text.) The text before the steps says that "user agents must observe the following priorities when determining a style sheet's character encoding (from highest priority to lowest)". So as long as a later step doesn't contradict an earlier one, it is still applicable. > In case anyone is interested in what Mozilla does right now, the basic > algorithm for this step is: [...] A. What happens if you have a UTF-16 BOM, and an @charset encoded as UTF-16 which claims it is ISO-8859-1? B. Or no BOM, US-ASCII encoded @charset which claims to be UCS-4? C. Or UTF-8 BOM followed by US-ASCII @charset claiming ISO-8859-1? D. A UTF-16 BOM with no @charset being linked from a stylesheet or document that is known to be in UCS-2? E. a document whose odd bytes spell a US-ASCII @charset claiming UTF-16BE and whose even bytes spell a US-ASCII @charset claiming UTF-16LE, linked from a document or stylesheet claiming UTF-8? etc. These are the cases that these steps are partially clarifying. Per the text currently in the spec which we hope to have go to CR, in case A the UA would use UTF-16 (the BOM trumps the @charset), case B is undefined, case C would use UTF-8, case D would use UTF-16 (it can't be UCS-2, since if it was the BOM would have to be ignored per the text quoted above, and the BOM comes before linking metadata in the list), and case E would use UTF-8 (since the start doesn't contain an @charset rule in any character encoding). Case B will be covered by CSS3 Syntax. -- Ian Hickson )\._.,--....,'``. fL U+1047E /, _.. \ _\ ;`._ ,. http://index.hixie.ch/ `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 17 February 2004 19:24:52 UTC