W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > June 1997

re i18n

From: Terry Allen <tallen@sonic.net>
Date: Wed, 11 Jun 1997 17:31:45 -0700
Message-Id: <199706120031.RAA05695@bolt.sonic.net>
To: w3c-sgml-wg@w3.org

Tim gave a pretty good precis of the situation.  What I can say is
that the Internet Architecture Board's charset report recommends
ISO 10646 instead of Unicode (quite briefly), based on (as I learn
by asking around) the easier extensibility of 10646 when the
60,000 new characters are added to the second plane.  At that time,
Unicode will have to employ the switching mode Tim described (which
is ISO 2022 without the problems of extensibility to arbitrary
numbers of charsets), and (I think but am not sure) the bits sent
down the wire for 10646 and Unicode will differ.

As XML is targeted to the Internet, I think that abiding by the
IAB's (by no means definitive) policy is preferable.  As for the
size of the resulting files, someone more familiar with the
various UTF encoding may be able to comment better than I.  My
vote, if I had one, would be for 10646, with a note that the
Unicode 2.0 book is a good buy, a helpful guide, a fun read,
and a barrel of laughs, and may serve as a practical guide to
10646.  What this means for the somewhat odd &#n0000; syntax
in XML-lang (why not &#U0000;?  this has never been justified
publicly), might need to be worked out by specialists. 

Regards,
  Terry Allen    Electronic Publishing Consultant    tallen[at]sonic.net
                   http://www.sonic.net/~tallen/
    Davenport and DocBook:  http://www.ora.com/davenport/index.html
          T.A. at Passage Systems:  terry.allen[at]passage.com 
Received on Wednesday, 11 June 1997 20:31:07 EDT

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 10:04:40 EDT