- From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
- Date: Fri, 14 Apr 2000 02:53:24 +0800 (CST)
- To: w3c-i18n-ig@w3.org
- cc: xml-editor@w3.org, w3c-xml-core-wg@w3.org
On Wed, 12 Apr 2000, Paul Hoffman / IMC wrote: > They are similar to the UTF-16 charset, but they have different rules. > UTF-16 is an encoding, not a charset. All three charsets start with the > UTF-16 transformation format, then add rules to make them charsets. Now this is a really interesting comment. It manages to have UTF-16 the encoding not a charset UTF-16 the transformation format not a charset UTF-16 the charset (transformation format + added rules) UTF-16LE the charset (transformation format + added rules) UTF16-BR the charset (transformation format + added rules) and, of course, none of them are the same as Unicode, the character set The W3C I18n WG recently said "we should use 'charset' instead of 'encoding' since it will confuse less people, for the XML infoset". At the least, Paul's use of "encoding is not charset" is opposite to W3C's use of "encoding is charset". (And we also have, "UTF-16 the thing that the WG meant when it put XML together" and "UTF-16 the thing that IANA meant at the time when XML was put together". IMHO when constructing an errata, it is that former one which is key to figuring out what to do. ) My guess is Paul means, in order, UTF-16 the generic encoding as used in wide characters, i.e., in a program UTF-16 the generic name for saving Unicode in 16tets UTF-16 the charset which may have a BOM UTF-16LE the charset which must have no BOM UTF-16BE the charset which must have no BOM I disagree that encoding in XML corresponds to any of those exactly: XML encoding is implementation neutral about use inside a program XML encoding parameter is not generic but completely specific XML encoding does not have a requirement to fit in with any RFC that intrudes into the area of what we are allowed to put inside data (outside transmission control characters); also, it is an assertion about the character encoding used: it must be the writer's choice whether or not it is good form according to any RFC or ISO standard " " Rick Jelliffe Academia Sinica
Received on Thursday, 13 April 2000 14:53:48 UTC