- From: Ray Whitmer <ray@xmission.com>
- Date: Mon, 9 May 2005 21:28:22 -0600 (MDT)
- To: Frans Englich <frans.englich@telia.com>
- cc: www-dom@w3.org
On Tue, 10 May 2005, Frans Englich wrote: > > > Hello, > > Is the XPath implementation or the user of the implementation responsible for > ensuring that no logically-adjacent text nodes exists? The specification[1] > says this in section 1.2.4, Text Nodes: > > "Applications using XPath in an environment with fragmented text nodes must > manually gather the text of a single logical text node possibly from multiple > nodes beginning with the first Text node or CDATASection node returned by the > implementation." > > I interpret that as the implementation can "expect" only separate text nodes > to be exist, but that still leaves the a bit unsure sitation of when the user > code simply is buggy; that the user forgets to manually merge text > nodes(think web browsers). > > > Regards, > > Frans > > > Frans Englich > KDE Developer It has been a while, but let me give it a shot. The intent was that an XPath implementation does not mutate an existing DOM hierarchy. If a DOM hierarchy has been mutated such that there is fragmentation, i.e. multiple DOM text nodes representing a single XPath text, then DOM will try to return 1 item for each item called for as a return by the XPath specification. If the one XPath text is represented as several DOM nodes, then the implementation will only return the first one and leave it up to the caller to notice that he has to gather up the full XPath text of the return from some adjacent nodes as well. Otherwise, there would not be a 1:1 relationship between DOM XPath returns and XPath-defined returns, and the caller would be confused one way or the other. There are two things that can cause logically-adjacent text nodes to occur: 1. Fragmentation during mutation, for example, remove an element where there was a text node on either side, and the two text nodes remain in DOM until joined/normalized. 2. If a document was loaded with entity references preserved, then part of the text might be inside and part outside of the reference, yet from the XPath perspective, a single text exists due to complete entity expansion. So, the user of the XPath API can take several approaches, but it is not valid to expect DOM to mutate the text nodes back together just to satisfy the XPath call: 1. Retrieve the document with entity references completely expanded and eliminated guaranteeing no logically-adjacent text nodes. 2. Do not mutate, or if you do mutate, normalize the document before making XPath calls. In this case the document will not have fragmented representation mof what XPath ccdonsiders a whole text. Or: Whenever retrieving a text node, gather up logically-adjacent text nodes and consider that instead of the single returned value. You can use the attribute wholeText (new in Level 3) to do this automatically. In a nutshell, the XPath interface will only return the first node of a set of logically-adjacent nodes, and you get to assume the following nodes. Ray
Received on Tuesday, 10 May 2005 03:28:27 UTC