Does UTF-16 require a BOM? (was: Re: SVG 1.2 Comment: image/svg+xml;charset="")

* Chris Lilley wrote:
>AvK> That's not true. You can have UTF-16 or UTF-8 content for that matter
>AvK> without a BOM.
>
>Um, leaving aside UTF-8, and noting that UTF-16 is not the same as
>UTF-16BE and UTF-16LE, please justify this statement with reference toa
>named portion of a specification.

That should be obvious from RFC2781, e.g. section 3.2 notes "the
character 0xFEFF in the first position of a stream MAY be interpreted
as a zero-width non-breaking space, and is not always a byte-order
mark". In XML 1.0, entities encoded in UTF-16 are required to start
with a byte order mark but it is only an error (not a fatal error)
not to do that. For example (all examples have no BOM and are UTF-16
encoded)

  Content-Type: application/xml

  <?xml version="1.0"?>

this would be a fatal error ("it is a fatal error [...] for an entity
which begins with neither a Byte Order Mark nor an encoding declaration
to use an encoding other than UTF-8.") but

  Content-Type: application/xml

  <?xml version="1.0" encoding="UTF-16"?>

would not be a fatal error, especially when using big-endian order.

Received on Thursday, 25 November 2004 13:48:04 UTC