- From: Philippe Le Hegaret <plh@w3.org>
- Date: Thu, 19 Feb 2004 12:08:03 -0500
- To: Kasimier Buchcik <kbuchcik@4commerce.de>
- Cc: WWW DOM <www-dom@w3.org>
- Message-Id: <1077210482.20249.157.camel@jfouffa.w3.org>
On Mon, 2004-02-09 at 05:45, Kasimier Buchcik wrote: > Although I might get flamed about repeating a question (I posted it > further down the tread), I still need to clarify the format of the > DOMString if using LSSerializer.writeToString. As you wrote, I see that > the declaration needs to be "UTF-16". But is it required to use a BOM? The Group rediscussed this issue and came up with the following proposal (however, I added "byte-order-mark-needed" warning): The following sentences were removed: [[ When outputting XML data, implementations are required to support the encodings "UTF-8", "UTF-16BE", and "UTF-16LE" to guarantee that data is serializable in all encodings that are required to be supported by all XML parsers. ]] http://www.w3.org/TR/2004/PR-DOM-Level-3-LS-20040205/load-save.html#LS-LSSerializer-write The following sentences: [[ "When outputting unicode data, whether or not a byte order mark is serialized, or if the output is big-endian or little-endian, is implementation dependent." ]] http://www.w3.org/TR/2003/CR-DOM-Level-3-LS-20031107/load-save.html should read: [[ Implementations are required to support the encodings "UTF-8", "UTF-16", "UTF-16BE", and "UTF-16LE" to guarantee that data is serializable in all encodings that are required to be supported by all XML parsers. When the encoding is UTF-8, whether or not a byte order mark is serialized, or if the output is big-endian or little-endian, is implementation dependent. When the encoding is UTF-16, whether or not the output is big-endian or little-endian is implementation dependent, but a Byte Order Mark must be generated for non-character outputs, such as LSOutput.byteStream or LSOutput.systemId. If the Byte Order Mark is not generated, a "byte-order-mark-needed" warning is reported. When the encoding is UTF-16LS or UTF-16BE, the output is big-endian (UTF-16BE) or little-endian (UTF-16LE) and the Byte Order Mark is not be generated. In all case, the encoding declaration, if generated, will correspond to the encoding used during the serialization (e.g. encoding="UTF-16" will appear if UTF-16 was requested). ]] > So, once more: has the DOMString to hold a BOM if serializing with > LSSerializer.writeToString? No. a DOMString object never contains a BOM, since it is a character oriented output. Philippe
Received on Thursday, 19 February 2004 12:08:04 UTC