Re: DOM XPath: Who's responsible for merging text nodes? User or implementation? from Ray Whitmer on 2005-05-10 (www-dom@w3.org from April to June 2005)

From: Ray Whitmer <ray@xmission.com>
Date: Mon, 9 May 2005 21:28:22 -0600 (MDT)
To: Frans Englich <frans.englich@telia.com>
cc: www-dom@w3.org
Message-ID: <Pine.LNX.4.63.0505092106540.8964@xmission.xmission.com>

On Tue, 10 May 2005, Frans Englich wrote:

>
>
> Hello,
>
> Is the XPath implementation or the user of the implementation responsible for
> ensuring that no logically-adjacent text nodes exists? The specification[1]
> says this in section 1.2.4, Text Nodes:
>
> "Applications using XPath in an environment with fragmented text nodes must
> manually gather the text of a single logical text node possibly from multiple
> nodes beginning with the first Text node or CDATASection node returned by the
> implementation."
>
> I interpret that as the implementation can "expect" only separate text nodes
> to be exist, but that still leaves the a bit unsure sitation of when the user
> code simply is buggy; that the user forgets to manually merge text
> nodes(think web browsers).
>
>
> Regards,
>
> 		Frans
>
>
> Frans Englich
> KDE Developer

It has been a while, but let me give it a shot.

The intent was that an XPath implementation does not mutate an existing
DOM hierarchy.  If a DOM hierarchy has been mutated such that there is
fragmentation, i.e. multiple DOM text nodes representing a single XPath
text, then DOM will try to return 1 item for each item called for as a
return by the XPath specification.  If the one XPath text is represented
as several DOM nodes, then the implementation will only return the first
one and leave it up to the caller to notice that he has to gather up the 
full XPath text of the return from some adjacent nodes as well.

Otherwise, there would not be a 1:1 relationship between DOM XPath returns and
XPath-defined returns, and the caller would be confused one way or the other.

There are two things that can cause logically-adjacent text nodes to occur:

1.  Fragmentation during mutation, for example, remove an element where there
was a text node on either side, and the two text nodes remain in DOM until
joined/normalized.

2.  If a document was loaded with entity references preserved, then part of
the text might be inside and part outside of the reference, yet from the
XPath perspective, a single text exists due to complete entity expansion.

So, the user of the XPath API can take several approaches, but it is not 
valid to expect DOM to mutate the text nodes back together just to satisfy 
the XPath call:

1.  Retrieve the document with entity references completely expanded and
eliminated guaranteeing no logically-adjacent text nodes.

2.  Do not mutate, or if you do mutate, normalize the document before making
XPath calls.

In this case the document will not have fragmented representation mof what 
XPath ccdonsiders a whole text.

Or:

Whenever retrieving a text node, gather up logically-adjacent text nodes
and consider that instead of the single returned value.  You can use the
attribute wholeText (new in Level 3) to do this automatically.

In a nutshell, the XPath interface will only return the first node of a set
of logically-adjacent nodes, and you get to assume the following nodes.

Ray

Received on Tuesday, 10 May 2005 03:28:27 UTC