[CSS21] 4.4. "BOM and/or @charset"

Section 4.4 defines character encoding priorities for external style sheets. The second item is "BOM and/or @charset". A table details a set of initial style sheet stream bytes, followed by this clause:

"If the encoding is detected based on one of the entries in the table above marked "as specified", the user agent ignores the style sheet if it does not parse an appropriate @charset rule at the beginning of the stream of characters resulting from decoding in the chosen @charset. This ensures that:

        @charset rules should only function if they are in the encoding of the style sheet,
        byte order marks are ignored only in encodings that support a byte order mark, and
        encoding names cannot contain newlines."

What would these rules mean in the following case : a style sheet starts with a UTF-16 BOM, followed by a UTF-16 encoded @charset("windows-1252") rule. The user agent will detect windows-1252 using the rules and table described in 4.4. As it then decodes the style sheet using windows-1252, the BOM should not be ignored per the clause above. This implies the UA would not be able to "parse an appropriate @charset rule at the beginning of the stream of characters resulting from decoding in the chosen @charset".

Overall, the table and subsequent closes seem to imply @charset takes precedence over the BOM.  Is this correct?  If so, what is the purpose of giving @charset this precedence?  If the @charset conflicts with the BOM, then there will be garbage bytes at the start of the document.

Why not give the BOM precedence and ignore the @charset if it conflicts?

As a corollary, if the HTTP charset overrides the BOM are we not supposed to remove the BOM?

Received on Thursday, 9 October 2008 01:56:59 UTC