Re: Reviewed charmod fundamentals from Rick Jelliffe on 2004-03-23 (www-tag@w3.org from March 2004)

From: Rick Jelliffe <ricko@topologi.com>
Date: Wed, 24 Mar 2004 06:04:28 +1100
To: www-tag@w3.org
Message-ID: <40608A3C.4000903@topologi.com>

Tim Bray raised:

 > C016   [S]   When  designing a new protocol, format or API,
 > specifications  SHOULD mandate a unique character encoding.

I think this SHOULD is partly bogus: people who deploy systems are
smart enough and at the right place to decide which encoding
to use.

The only times you strictly need to mandate a unique character
encoding is either if you require/aspire-to/pretend-at
guaranteed global interoperability for your protocol/format/API,
or when there is a metadata disconnect that prevents labelling
information from being transmitted.

Furthermore, in the particular case of the CJK large character
sets, the size of tables to, say, map from a DBMS serializer with
text in GB 18030 to Unicode is not small.  Furthermore, it is
requirement from the PRC govt that "any software application that
is released for the Chinese market after (2001) must support GB
18030" [1], so we should not expect China to move to UTF-* in
hurry. (GB 18030 is an alternative encoding for Unicode, with
completely different code points and so on, to be compatible with
their existing 1/2 byte standard GB 2312. GB 18030 is 1/2/4 bytes.)

I suggest something like:

 > C016   [S]   When  designing a new protocol, format or API,
 > specifications  SHOULD mandate a unique character encoding
 > for global guaranteed interoperability, and SHOULD specify
 > a reliable mechanism for specifying regional encodings
 > if guaranteed interoperability is not a criterion.

(I have been typesetting some PRC laws in the last few months,
and I can confirm that GB 18030 is in use.)

Cheers
Rick Jelliffe


[1]http://www-106.ibm.com/developerworks/library/u-china.html

Received on Tuesday, 23 March 2004 14:05:22 UTC