W3C home > Mailing lists > Public > public-xml-binary-comments@w3.org > February 2005

Human language neutral property

From: Elliotte Harold <elharo@metalab.unc.edu>
Date: Fri, 25 Feb 2005 08:24:03 -0500
Message-ID: <421F26F3.7050501@metalab.unc.edu>
To: public-xml-binary-comments@w3.org

The statement that "it is impossible for a format to perform identically 
in terms for instance of compactness or processing efficiency for a 
language that can be entirely captured using a single byte per character 
and for one that requires a multi-byte encoding" is untrue. It is 
certainly possible to provide equally compact and efficient data for 
languages like English and languages like Chinese. To do so simply 
choose an encoding form such as UTF-32 that does not preference one over 
the other.

Such an encoding is suboptimal for English, but it would absolutely 
have the characteristic that English and Chinese would be treated 
equally efficiently.

The point of human language neutrality is precisely to avoid 
preferencing one language or script over another. This would make UTF-8 
an inappropriate choice here. UTF-32 is the most neutral, but as a 
practical matter, I suspect no one would be too peeved by UTF-16, and 
that's probably the most reasonable compromise for textual data.

Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
Received on Friday, 25 February 2005 13:24:06 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:34:34 UTC