W3C home > Mailing lists > Public > www-xml-xinclude-comments@w3.org > January 2005

Re: Normalize newlines when parse="text"?

From: Daniel Veillard <daniel@veillard.com>
Date: Sat, 22 Jan 2005 00:12:03 +0100
To: Mike Brown <mike@skew.org>
Cc: daniel@veillard.com, www-xml-xinclude-comments@w3.org
Message-ID: <20050121231203.GF2727@daniel.veillard.com>

On Fri, Jan 21, 2005 at 12:00:56PM -0700, Mike Brown wrote:
> Daniel Veillard wrote:
> > XInclude states:
> > 
> >   http://www.w3.org/TR/xinclude/#text-included-items
> > ------
> >   Each character obtained from the transformation of the resource is
> >   represented in the top-level included items as a character information
> >   item with the character code set to the character code in ISO 10646
> >   encoding, and the element content whitespace set to false.
> > ------
> > 
> > Both character of code point 0xa and 0xd are in the range allowed by
> > the Char production of the XML spec and won't raise errors.
> Thanks; I saw the same sections you did, but I also saw in the Infoset spec:
> -  appendix B "XML Reporting Requirements (informative)"
>        item 3 "An XML processor must normalize line-ends to LF 
>                before passing them to the application (2.11)."
> -  appendix D "What is not in the Information Set"
>        item 9 "The difference between CR, CR-LF, and LF line termination."
> So it seems that the intent is for the Information Set to be constrained to 
> XML's restrictions w.r.t. newlines -- an XML parser must report normalized 

  Hum, no, you can have any sequence of 0xa and 0xd in an infoset, for example
if you parse an XML instance with numeric character references. That won't
be normalized by the parser. There is no a apriori restrictions on the
infoset for those characters, just a rule at the parser level which may build

> newlines to the application, and the infoset is a model of what the parser 
> reports to the application.
> If that is the case, then an infoset 'transformation' like XInclude, while not 
> explicitly requiring newline normalization, might be expected to normalize 
> newlines anyway.

  XInclude operates on the infoset, it is not a parser. The limitations are
defined in the XInclude spec, only on the range of characters allowed and
character model normalization (if any). And there is no rule there requiring


Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | 
Received on Friday, 21 January 2005 23:12:11 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:09:36 UTC