Re: XML character sets: a proposal
[In response to Paul Papresco]
>>I should note that I have never said that I wish to *require* that all
>>XML parsers be open-ended. I have no problem at all with seeing Latin
>>1 XML systems, and SJIS XML systems, though I expect most will
>>actually be UNICODE based.
>I think that a major goal of XML should be "reliable interoperability". I do
>not think that "support for whatever you need to do" should be a major goal.
>If people start creating standards-compliant SJIS XML documents on the Web
>and standards-compliang XML client software cannot read it, then we have a
>PROBLEM, in my opinion.
I agree, but it is a problem which will not be solved by saying "all
XML documents *must* be in either UTF-8 or UTF-16". Such a statement
*will* be ignored.
>One last question: Isn't it reasonable to expect that most local encodings
>could be translated into XML by the HTTP server on transmission? If so,
>there is no loss of convenience in requiring the on-the-wire format to be
>standardized in the same way that you would expect FTP keywords to always be
>ASCII and IP packets to have a certain byte-ordering.
This is precisely why I don't think it's such a big deal. If you send
notification to the server that you can only accept UTF-8, ASCII, or
Latin 1, and the server send you a SJIS document, the server is
broken. It should either convert the doducment to UTF-8 before sending
it to you, *or* send an error message.
NOTE 1: In the future, language like JAVA will also give applicatons
the option of downloading translators.
NOTE 2: One of the primary reasons for requiring a single document
character set is precisely such cases: the server can
*blindly* (ie. without parsing) convert the XML data, which
can be made *fast*.
>I don't know what we'll decide in the end, Gavin, but isn't it nice that
>we're talking about Unicode _as a minimum_? "We've come a long way, Baby."
I guess you can actually remember me arguing the hard minimalist path
some time ago... as weel as arguing for support for the TEI subset ;-)