- From: Rick Jelliffe <ricko@topologi.com>
- Date: Wed, 24 Mar 2004 06:04:28 +1100
- To: www-tag@w3.org
Tim Bray raised: > C016 [S] When designing a new protocol, format or API, > specifications SHOULD mandate a unique character encoding. I think this SHOULD is partly bogus: people who deploy systems are smart enough and at the right place to decide which encoding to use. The only times you strictly need to mandate a unique character encoding is either if you require/aspire-to/pretend-at guaranteed global interoperability for your protocol/format/API, or when there is a metadata disconnect that prevents labelling information from being transmitted. Furthermore, in the particular case of the CJK large character sets, the size of tables to, say, map from a DBMS serializer with text in GB 18030 to Unicode is not small. Furthermore, it is requirement from the PRC govt that "any software application that is released for the Chinese market after (2001) must support GB 18030" [1], so we should not expect China to move to UTF-* in hurry. (GB 18030 is an alternative encoding for Unicode, with completely different code points and so on, to be compatible with their existing 1/2 byte standard GB 2312. GB 18030 is 1/2/4 bytes.) I suggest something like: > C016 [S] When designing a new protocol, format or API, > specifications SHOULD mandate a unique character encoding > for global guaranteed interoperability, and SHOULD specify > a reliable mechanism for specifying regional encodings > if guaranteed interoperability is not a criterion. (I have been typesetting some PRC laws in the last few months, and I can confirm that GB 18030 is in use.) Cheers Rick Jelliffe [1]http://www-106.ibm.com/developerworks/library/u-china.html
Received on Tuesday, 23 March 2004 14:05:22 UTC