Re: new line handling (DOMBuilder and DOMSerializer) from Johnny Stenback on 2003-09-17 (www-dom@w3.org from July to September 2003)

From: Johnny Stenback <jst@w3c.jstenback.com>
Date: Wed, 17 Sep 2003 15:38:58 -0700
To: Christian Parpart <cparpart@surakware.net>
Cc: www-dom@w3.org
Message-ID: <3F68E282.8080300@w3c.jstenback.com>

Christian Parpart wrote:
> Hi,
> 
> we got a serious problem on de.comp.text.xml about newline handling inside 
> XSLT.
> 
> <xsl:text>
> </xsl:text>
> <xsl:text>&#10;</xsl:text>
> <xsl:text>&#10;&#13;</xsl:text>
> <xsl:text>&#13;/>
> <xsl:value-of select="'&#10;"/>
> <xsl:value-of select="'&#10;&#13;'"/>
> <xsl:value-of select="'&#13;'"/>
> 
> These are the 7 ways how to create a newline in the XSLT result tree.
> 
> Now, why I am asking right here, is, because I wanna know how the DOMParser 
> (DOMBuilder) should handle theses character references inside text nodes and 
> inside attribute nodes, and the newline-literal shown first.
> 
> The xml recommendation tells that a newline shall be always represented as 
> 0x10 literal and though be passed from the DOMBuilder to the application as 
> 0x10. But will all versions above really work?

Newline normalization is always done before character entities are 
expanded, and what you get when that's performed, that's what you'll see 
in the DOM.

> 
> Someone tested version 2, 3, and 4 with msxml, saxon, and libxml2/libxslt and 
> got very different results.
> 
> Is newline normalization part of character normalization and though optional 
> or should it be performed *ALWAYS*? 
> 
> Should *ANY* newline variant be interpreted as the UNIX newline variant?
> Or is this part of the DOMSerializer to perform the normalization of newlines 
> into the environment-specific newline form?
> 
> The XSLT spec doesn't mention these cases above, the XML rec doesn't neither. 
> So, I hope this is part of DOM3 LS to specify how to build/serialize newline 
> characters ;)

Unfortunately this is beyond the scope of the LS spec, the LS spec 
simply defines how to parse a document into a DOM structure and then 
serialize that structure out to a sequence of bytes, in one form or 
another. The fact that some information is lost in that process (due to 
XML 1.0 processing, and what not) is, and will remain, a fact. Newlines 
in some cases (as in some of the cases above), are only part of what's 
lost in this process, and this is not a problem the DOM WG is chartered 
to solve.

> 
> Many many thanks,
> Christian Parpart.
> 

-- 
jst

Received on Wednesday, 17 September 2003 18:39:29 UTC