BOCU-1, SCSU, etc.

Hi, the chapter about "acceptable" charsets (8.2.2.2) is messy.
Clearly UTF-8 and windows-1252 are popular, and you have that.

What you need as a "minimum" for new browsers is UTF-8, US-ASCII
(as popular proper subset of UTF-8), ISO-8859-1 (as HTML legacy),
and windows-1252 for the reasons stated in the draft, supporting
Latin-1 but not windows-1252 would be stupid. 

BTW, I'm not aware that windows-1252 is a violation of CHARMOD,
I asked a question about it and C049 in a Last Call of CHARMOD.

Please s/but may support more/but should support more/ - the
minimum is only that, the minimum.

| User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
| encodings

I can see a MUST NOT for UTF-7 and CESU-8.  And IMO the only good
excuse for legacy charsets is backwards compatibility.  But that
is at worst a "SHOULD NOT" for BOCU-1, as you have it for UTF-32.

I refuse to discuss SCSU, but MUST NOT is rather harsh, isn't it ?
In 3.7.5.4 you say:

| Authors should not use JIS_X0212-1990, x-JIS0208, and encodings
| based on EBCDIC.  Authors should not use UTF-32.

What's the logic behind these recommendations ?  Of course EBCDIC
is rare (as far as HTML is concerned I've never seen it), but it's
AFAIK not worse than codepage 437, 850, 858, or similar charsets.

And UTF-32 is relatively harmless, not much worse than UTF-16, it
belongs to the charsets recommended in CHARMOD.  Depending on what
happens in future Unicode versions banning UTF-32 could backfire.

There are lots of other charsets starting with UTF-1 that could be
listed as SHOULD NOT or even MUST NOT.  Whatever you pick, state
what your reasons are, not only the (apparently) arbitrary result.

Please make sure that all *unregistered* charsets are SHOULD NOT. 
Yes, I know the consequences for some proprietary charsets, they
are free to register them or to be ignored (CHARMOD C022).
 
 Frank

Received on Friday, 25 January 2008 14:35:20 UTC