- From: Jonathan Marsh <jmarsh@microsoft.com>
- Date: Wed, 9 Jan 2002 10:49:23 -0800
- To: "Elliotte Rusty Harold" <elharo@metalab.unc.edu>
- Cc: <www-xml-xinclude-comments@w3.org>
The next draft will make it clear that BOMs are not included, but all other characters are. We note that your assertion that a BOM is not permitted in an XML document is false - a BOM is U+FEFF, not U+FFFE. We keep the prohibition on illegal characters. Thanks, Jonathan Marsh > -----Original Message----- > From: Elliotte Rusty Harold [mailto:elharo@metalab.unc.edu] > Sent: Sunday, September 02, 2001 7:31 AM > To: www-xml-xinclude-comments@w3.org > Subject: Byte order marks when parse="text" > > This message addresses the question of what to do with a byte order mark > that is present at the start of a Unicode text document of any kind, XML > or otherwise that is included with <xinclude:include href="doc.txt" > parse="text"/>. > > The first sentence in 4.3 states, "When parse='text', the include location > is dereferenced and the resource is fetched. This resource is treated as > plain text and converted to a set of character information items without > attempting to parse the resource as XML." However, what if the first > character in plain text is a byte order mark? In this case, "Characters > that are not permitted in XML documents also are an error." Thus any > including any document beginning with a byte order mark would produce an > error. This is clearly a bad thing. > > It might be argued that the first sentence does not require that *all* > characters in the included document must be converted to character > information items. However, if this interpretation is allowed, then > there's nothing except the implementer's good sense to say which > characters to convert. For instance, an implementer could plausibly decide > to ignore all non-XML-legal text characters such as vertical tab and form > feed rather than raising an error. > > I suggest this be rewritten to make it explicit that all characters in the > included text document except an initial byte order mark must be included > and that an initial byte order mark must be deleted before inclusion. > -- > > +-----------------------+------------------------+-------------------+ > | Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer | > +-----------------------+------------------------+-------------------+ > | The XML Bible, 2nd Edition (Hungry Minds, 2001) | > | http://www.ibiblio.org/xml/books/bible2/ | > | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | > +----------------------------------+---------------------------------+ > | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | > | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | > +----------------------------------+---------------------------------+
Received on Wednesday, 9 January 2002 14:40:09 UTC