W3C home > Mailing lists > Public > public-html-comments@w3.org > January 2008

BOCU-1, SCSU, etc.

From: Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
Date: Fri, 25 Jan 2008 15:35:23 +0100
Message-ID: <012601c85f5f$86e97dc0$cf58863e@xyzzy>
To: <public-html-comments@w3.org>

Hi, the chapter about "acceptable" charsets ( is messy.
Clearly UTF-8 and windows-1252 are popular, and you have that.

What you need as a "minimum" for new browsers is UTF-8, US-ASCII
(as popular proper subset of UTF-8), ISO-8859-1 (as HTML legacy),
and windows-1252 for the reasons stated in the draft, supporting
Latin-1 but not windows-1252 would be stupid. 

BTW, I'm not aware that windows-1252 is a violation of CHARMOD,
I asked a question about it and C049 in a Last Call of CHARMOD.

Please s/but may support more/but should support more/ - the
minimum is only that, the minimum.

| User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
| encodings

I can see a MUST NOT for UTF-7 and CESU-8.  And IMO the only good
excuse for legacy charsets is backwards compatibility.  But that
is at worst a "SHOULD NOT" for BOCU-1, as you have it for UTF-32.

I refuse to discuss SCSU, but MUST NOT is rather harsh, isn't it ?
In you say:

| Authors should not use JIS_X0212-1990, x-JIS0208, and encodings
| based on EBCDIC.  Authors should not use UTF-32.

What's the logic behind these recommendations ?  Of course EBCDIC
is rare (as far as HTML is concerned I've never seen it), but it's
AFAIK not worse than codepage 437, 850, 858, or similar charsets.

And UTF-32 is relatively harmless, not much worse than UTF-16, it
belongs to the charsets recommended in CHARMOD.  Depending on what
happens in future Unicode versions banning UTF-32 could backfire.

There are lots of other charsets starting with UTF-1 that could be
listed as SHOULD NOT or even MUST NOT.  Whatever you pick, state
what your reasons are, not only the (apparently) arbitrary result.

Please make sure that all *unregistered* charsets are SHOULD NOT. 
Yes, I know the consequences for some proprietary charsets, they
are free to register them or to be ignored (CHARMOD C022).
Received on Friday, 25 January 2008 14:35:20 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:26:24 UTC