Re: "Re: writeToString, write and, UTF-16[BE|LE]"

On Mon, 2004-02-09 at 05:45, Kasimier Buchcik wrote:
> Although I might get flamed about repeating a question (I posted it 
> further down the tread), I still need to clarify the format of the 
> DOMString if using LSSerializer.writeToString. As you wrote, I see that 
> the declaration needs to be "UTF-16". But is it required to use a BOM?

The Group rediscussed this issue and came up with the following proposal
(however, I added "byte-order-mark-needed" warning):
The following sentences were removed:
[[
When outputting XML data, implementations are required to support the
encodings "UTF-8", "UTF-16BE", and "UTF-16LE" to guarantee that data is
serializable in all encodings that are required to be supported by all
XML parsers.
]]
http://www.w3.org/TR/2004/PR-DOM-Level-3-LS-20040205/load-save.html#LS-LSSerializer-write

The following sentences:
[[
"When outputting unicode data, whether or not a byte order mark is
serialized, or if the output is big-endian or little-endian, is
 implementation dependent."
]]
http://www.w3.org/TR/2003/CR-DOM-Level-3-LS-20031107/load-save.html

should read:
[[
Implementations are required to support the encodings "UTF-8", "UTF-16",
"UTF-16BE", and "UTF-16LE" to guarantee that data is serializable in all
encodings that are required to be supported by all XML parsers. When the
encoding is UTF-8, whether or not a byte order mark is serialized, or if
the output is big-endian or little-endian, is implementation dependent.
When the encoding is UTF-16, whether or not the output is big-endian or
little-endian is implementation dependent, but a Byte Order Mark must be
generated for non-character outputs, such as LSOutput.byteStream or
LSOutput.systemId. If the Byte Order Mark is not generated, a
"byte-order-mark-needed" warning is reported. When the encoding is
UTF-16LS or UTF-16BE, the output is big-endian (UTF-16BE) or
little-endian (UTF-16LE) and the Byte Order Mark is not be generated. In
all case, the encoding declaration, if generated, will correspond to the
encoding used during the serialization (e.g. encoding="UTF-16" will
appear if UTF-16 was requested).
]]

> So, once more: has the DOMString to hold a BOM if serializing with 
> LSSerializer.writeToString?

No. a DOMString object never contains a BOM, since it is a character
oriented output.

Philippe

Received on Thursday, 19 February 2004 12:08:04 UTC