Byte order marks when parse="text"

This message addresses the question of what to do with a byte order mark that is present at the start of a Unicode text document of any kind, XML or otherwise that is included with  <xinclude:include href="doc.txt" parse="text"/>.

The first sentence in 4.3 states, "When parse='text', the include location is dereferenced and the resource is fetched. This resource is treated as plain text and converted to a set of character information items without attempting to parse the resource as XML." However, what if the first character in plain text is a byte order mark? In this case, "Characters that are not permitted in XML documents also are an error." Thus any including any document beginning with a byte order mark would produce an error. This is clearly a bad thing.

It might be argued that the first sentence does not require that *all* characters in the included document must be converted to character information items. However, if this interpretation is allowed, then there's nothing except the implementer's good sense to say which characters to convert. For instance, an implementer could plausibly decide to ignore all non-XML-legal text characters such as vertical tab and form feed rather than raising an error.

I suggest this be rewritten to make it explicit that all characters in the included text document except an initial byte order mark must be included and that an initial byte order mark must be deleted before inclusion.
-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+ 
|          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
|              http://www.ibiblio.org/xml/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      | 
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |
+----------------------------------+---------------------------------+

Received on Sunday, 2 September 2001 11:15:07 UTC