Re: [CSS21] response to issue 115 (and 44) from Ernest Cline on 2004-02-21 (www-style@w3.org from February 2004)

From: Ernest Cline <ernestcline@mindspring.com>
Date: Fri, 20 Feb 2004 21:59:13 -0500
To: "Bert Bos" <bert@w3.org>, "WWW Style" <www-style@w3.org>
Message-ID: <410-22004262125913796@mindspring.com>

> [Original Message]
> From: Bert Bos <bert@w3.org>

> I also omitted the CHARSET parameter of the LINK element in HTML.
> Is that a problem?

No.  Based on section 5.2.2 of the HTML 4.01 standard, it is fairly clear
that the charset attribute should be considered a source of out-of band
information as mentioned in step (1) of your algorithm, and as such,
should be handled in accordance with how the standard says to
choose between the multiple possible sources of out-of-band info.

> The algorithm for (2) would be as follows:

2a-2d) Detect one of the UTF-32 BOM variants.

>   2e) If the first bytes are FE FF xx, where xx is not 00, use UTF-16-BE.
>       Remove the first two bytes. If they are followed by "@charset
>       <anything>;", remove that as well.
>
>   2f) If the first bytes are FF FE xx, where xx is not 00, use UTF-16-LE.
>       Remove the first two bytes. If they are followed by "@charset
>       <anything>;", remove that as well.

And what if the third byte is 00, as in FE FF 00 40?  You've already
eliminated the possibility of UTF-32 by the first four steps.

Taken literally, your algorithm could cause a UTF-16 stylesheet 
to be taken as different encoding because of step (3) altho I doubt
that was your intention.

>   2g) If the first bytes are EF BB BF, use UTF-8.
>       Remove those bytes. If they are followed by "@charset
>       <anything>;" remove that as well.

So is CESU-8 is to be implicitly prohibited from using a BOM,
unless identified as such by out-of-band info, since that would
cause it to be treated as UTF-8?  (I could live with that as
CESU-8 isn't really intended for transmission of data.)

> 3) If neither the header nor looking for U+FEFF or @charset
> yield an encoding, but this style sheet was loaded because
> a document linked to it (or linked to a style sheet that in turn
> linked to it, recursively), then use the encoding of the
> document (or style sheet) that linked to this one.
> 
> 4) If all else fails, assume UTF-8.

How could step (3) fail to determine a character encoding?

Received on Friday, 20 February 2004 21:59:16 UTC