- From: Kasimier Buchcik <kbuchcik@4commerce.de>
- Date: Thu, 05 Feb 2004 14:29:52 +0100
- To: <www-dom@w3.org>
Hi, on 2/4/2004 11:39 PM Philippe Le Hegaret wrote: > On Wed, 2004-02-04 at 17:26, jcowan@reutershealth.com wrote: > >>Philippe Le Hegaret scripsit: >> >>>As indicated in >>>XML, entities encoded in UTF-16 MUST begin with the Byte Order Mark, so >>>I see no reason why the value of the XML declaration encoding should >>>contain "UTF-16BE" or "UTF-16LE", especially since this introduces some >>>interoperability troubles. >> >>That means that entities encoded in the encoding named "UTF-16" must begin >>with a BOM. Entities in the encodings "UTF-16BE" and "UTF-16LE" must not >>begin with a BOM, but must have an appropriate encoding declaration. Yes, I think so. > Looking again at XML 1.0 3rd, it says that UTF-16 encoded entities MUST > being with a BOM. Unless I'm misinterpreting the meaning of "UTF-16 > encoded entities", I would say that it does include UTF16-BE and > UTF16-LE as well. RFC 2781 does say: "Text in the "UTF-16LE" charset MUST be serialized with the octets which make up a single 16-bit UTF-16 value in little-endian order. Systems labelling UTF-16LE text MUST NOT prepend a BOM to the text." But the dilemma I see is that our Delphi implementation needs the DOMString not to have a BOM. And this is fine when reading the Load & Save candidate recommendation: (http://www.w3.org/TR/2003/CR-DOM-Level-3-LS-20031107/load-save.html) "When outputting unicode data, whether or not a byte order mark is serialized, or if the output is big-endian or little-endian, is implementation dependent." Maby it's just my understanding of the DOMString until now, which seemed to be quite bound to the implementing application. If I get a Node.nodeValue (in our Delphi implementation) I expect the DOMString to be encoded in UTF-16, little-endian with no BOM. If I serialize with LSSerializer.writeToString I would get a UTF-16 with a BOM - as the XML spec states. This would arise problems with DOMString operations, since I cannot predict if the DOMString came from the LSSerializer or not. I assumed that the DOMString was designed to be of consistent structure in an application, and that LSOutput.characterStream and LSOutput.byteStream would play the role of a *pure* XML entity. So is the DOMString really intended to represent a XML entity or should it be handled more like a interface to the implementing programming language? Thanks and regards, Kasimier Buchcik
Received on Thursday, 5 February 2004 08:26:07 UTC