RE: Byte order marks when parse="text"

The next draft will make it clear that BOMs are not included, but all
other characters are.  We note that your assertion that a BOM is not
permitted in an XML document is false - a BOM is U+FEFF, not U+FFFE.  We
keep the prohibition on illegal characters.

Jonathan Marsh

> -----Original Message-----
> From: Elliotte Rusty Harold []
> Sent: Sunday, September 02, 2001 7:31 AM
> To:
> Subject: Byte order marks when parse="text"
> This message addresses the question of what to do with a byte order
> that is present at the start of a Unicode text document of any kind,
> or otherwise that is included with  <xinclude:include href="doc.txt"
> parse="text"/>.
> The first sentence in 4.3 states, "When parse='text', the include
> is dereferenced and the resource is fetched. This resource is treated
> plain text and converted to a set of character information items
> attempting to parse the resource as XML." However, what if the first
> character in plain text is a byte order mark? In this case,
> that are not permitted in XML documents also are an error." Thus any
> including any document beginning with a byte order mark would produce
> error. This is clearly a bad thing.
> It might be argued that the first sentence does not require that *all*
> characters in the included document must be converted to character
> information items. However, if this interpretation is allowed, then
> there's nothing except the implementer's good sense to say which
> characters to convert. For instance, an implementer could plausibly
> to ignore all non-XML-legal text characters such as vertical tab and
> feed rather than raising an error.
> I suggest this be rewritten to make it explicit that all characters in
> included text document except an initial byte order mark must be
> and that an initial byte order mark must be deleted before inclusion.
> --
> +-----------------------+------------------------+-------------------+
> | Elliotte Rusty Harold | | Writer/Programmer |
> +-----------------------+------------------------+-------------------+
> |          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
> |                  |
> |   |
> +----------------------------------+---------------------------------+
> |  Read Cafe au Lait for Java News:      |
> |  Read Cafe con Leche for XML News:     |
> +----------------------------------+---------------------------------+

Received on Wednesday, 9 January 2002 14:40:09 UTC