RE: [CSS21] BOM & @charset (issues 44 & 115) from Ian Hickson on 2004-02-18 (www-style@w3.org from February 2004)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 18 Feb 2004 02:51:20 +0000 (UTC)
To: Ernest Cline <ernestcline@mindspring.com>
Cc: Bert Bos <bert@w3.org>, www-style@w3.org
Message-ID: <Pine.LNX.4.58.0402180247020.2286@dhalsim.dreamhost.com>

On Tue, 17 Feb 2004, Ernest Cline wrote:
>
> What about CESU-8 (from UTF#26)?  It shares the same BOM as UTF-8,
> so only the HTTP header or the @charset rule can distinguish them.
> (UTF#26 explicitly bars attempting to determine that the encoding is
> CESU-8 by auto detection.)

Oops, forgot about CESU-8. However, given CESU-8's extremely low status,
and strong wording in its specification against it being used for
information exchange, I feel it is of little more than academic concern.

If the stylesheet said:

   [UTF-8 BOM]@charset "CESU-8";

...then the case is unambiguous (it's CESU-8). If there is no way to
detect between CESU-8 and UTF-8 in a particular document (likely for many
utf-8 cases, I guess) then the algorithm falls to step 6, "UA-dependent
mechanisms", and compliant UAs would then default to UTF-8 (since they
aren't allowed to auto-detect CESU-8).

-- 
Ian Hickson                                      )\._.,--....,'``.    fL
U+1047E                                         /,   _.. \   _\  ;`._ ,.
http://index.hixie.ch/                         `._.-(,_..'--(,_..'`-.;.'

Received on Tuesday, 17 February 2004 21:51:26 UTC