[Prev][Next][Index][Thread]
Re: XML character sets: a proposal
> From: Gavin Nicol <gtn@ebt.com>
> Date: Thu, 12 Sep 1996 14:24:09 GMT
>
> >I agree that fixing the character repertoire on ISO 10646 is the right
> >way to go. (There appears to be consensus on this.)
>
> Good. One down.
>
> >However, although many platforms do not currently have support for
> >either UTF-8 or UCS-2/UTF-16, my impression is that most of them are
> >planning support for one or the other. This suggests the possibility
> >of allowing just these two encodings for XML.
>
> I will tell you exactly what will happen if we do this:
> 1) People who use ASCII, will pretend they are using UTF-8
I don't see the problem. A 7-bit ASCII file is also a UTF-8 file.
Isn't that the whole point of UTF-8?
> 2) People who do not use ASCII (SJIS, EUC, JOHAB etc.) will
> *ignore* this requirement and implement systems that handle the
> encodings they use every day.
XML isn't intended to be convenient to create by hand with a text
editor. People are mostly expected to be using SGML/XML editors to
create. Maybe for the next years, until support for Unicode comes
widespread, people who want to create documents containing Asian
characters with plain text editors will need to run a filter on their
files after editing them. That doesn't seem like a big deal to me.
The important issue seems to me to be that they can represent the
characters that they need to.
> As you, and many other must also be aware, UNICODE doesn't solve the
> worlds problems either, just 95% of the more common ones ;-) We have
> to think of XML being used to deliver things like the classical
> buddhist texts, dead languages, etc.
I don't see how allowing multiple encodings is going to help with
things like classical Buddhist texts. Since you've agreed that the
character repertoire should be ISO 10646, you're not going to be able
to represent any characters in your encoding that you can't represent
in UTF-8 or UTF-16. Support for multiple encodings can't be buying
you anything more than convenience.
James
Follow-Ups:
References: