- From: Terry Allen <tallen@sonic.net>
- Date: Wed, 11 Jun 1997 17:31:45 -0700
- To: w3c-sgml-wg@w3.org
Tim gave a pretty good precis of the situation. What I can say is that the Internet Architecture Board's charset report recommends ISO 10646 instead of Unicode (quite briefly), based on (as I learn by asking around) the easier extensibility of 10646 when the 60,000 new characters are added to the second plane. At that time, Unicode will have to employ the switching mode Tim described (which is ISO 2022 without the problems of extensibility to arbitrary numbers of charsets), and (I think but am not sure) the bits sent down the wire for 10646 and Unicode will differ. As XML is targeted to the Internet, I think that abiding by the IAB's (by no means definitive) policy is preferable. As for the size of the resulting files, someone more familiar with the various UTF encoding may be able to comment better than I. My vote, if I had one, would be for 10646, with a note that the Unicode 2.0 book is a good buy, a helpful guide, a fun read, and a barrel of laughs, and may serve as a practical guide to 10646. What this means for the somewhat odd &#n0000; syntax in XML-lang (why not &#U0000;? this has never been justified publicly), might need to be worked out by specialists. Regards, Terry Allen Electronic Publishing Consultant tallen[at]sonic.net http://www.sonic.net/~tallen/ Davenport and DocBook: http://www.ora.com/davenport/index.html T.A. at Passage Systems: terry.allen[at]passage.com
Received on Wednesday, 11 June 1997 20:31:07 UTC