- From: Martin J. Duerst <duerst@w3.org>
- Date: Thu, 13 Apr 2000 11:25:59 +0900
- To: Tim Bray <tbray@textuality.com>, Paul Hoffman / IMC <phoffman@imc.org>, John Cowan <cowan@locke.ccil.org>
- Cc: w3c-i18n-ig@w3.org, xml-editor@w3.org, w3c-xml-core-wg@w3.org
At 00/04/12 16:22 -0700, Tim Bray wrote: >Pardon my lack of imagination, but I just cannot see how a person or >committee can say that UTF-16BE stands on its own, and is "separated" >from UTF-16, with a straight face. Well, imagine an XML processor (actually, pretty much any XML processor out there, I guess) that gets text/xml; charset='UTF-16BE'. Which of the two error messages is that processor most probably giving: 1) Unknown character encoding 'UTF-16BE' 2) Missing BOM in UTF-16 encoding The first one is straightforward. The second one would require quite some intelligence, a feature that's not present in nor expected from XML processors. Also, imagine UTF-16BE would allow a BOM (an alternative that was discussed, but rejected because of the double use of the BOM as a ZWNJ). In such a case, what would you reasonably expect from XML processors when getting something as text/xml; charset='UTF-16BE', and with a (correct) BOM at the start: 1) That this of course UTF-16, and therefore every XML processor out in the field has to accept and process it. 2) That this is an unknown encoding, and will be rejected. I agree that it's difficult to imagine that a person or committee can say that UTF-16BE is something completely different from UTF-16. But what we are dealing with are not persons or committees, it's mechanical software. XML processors out in the field will with a straight face tell you that UTF-16BE is different from UTF-16. And because we are writing our specs for mechanical software, and not for intelligent persons and committees, we have to make ourselves constantly aware of the cases where machines 'think' differently (or nothing at all). Regards, Martin.
Received on Wednesday, 12 April 2000 22:51:52 UTC