Re: DOM Level 3 XPath Specification text node questions from Ray Whitmer on 2001-09-20 (www-dom@w3.org from July to September 2001)

From: Ray Whitmer <rayw@netscape.com>
Date: Thu, 20 Sep 2001 11:33:38 -0700
To: Chris Mennie <cam@decisionsoft.com>
CC: www-dom@w3.org
Message-ID: <3BAA3682.7040306@netscape.com>

Chris Mennie wrote:

>        I'm a little unclear about section 1.2.1 of the DOM Level 3 XPath
>spec. The line "Instead of returning multiple nodes where XPath sees a
>single logical text node, only the first non-empty DOM Text node of any
>logical XPath text will be returned in the node set" seems to indicate
>that the first DOM Text node (as opposed to DOM CDATA or DOM Entity
>Reference nodes) is returned. Starting from this Text node, it's up to the
>application to then determine what the entire set of nodes corresponding
>to that string actually is. For example, say we have:
>
><first>
>  <![CDATA[ one ]]>
>  two
>  <![CDATA[ three ]]>
>  four 
>
>  <second/>
>
>  five  
></first>
>
>        Then would the expression "/first/text()[1]" return the Text node
>containing "two"? This means that if I wanted to get the string "one two
>three four" I'd have to look at both previous and subsequent sibling nodes
>to determine the node set I wanted. It'd make more sense for the cdata
>node containing "one" to be returned.
>
By one interpretation, CDATASection is a subclass of Text.  Therefore, a 
CDATASection is a Text,
so the first text to be returned would be " one  ".  This was certainly 
the intent of the specification in this
case, although it is not clear to me that the specification is always 
careful that when it says Text, it can
mean Text or CDATASection.  It might be clearer to refer to these by 
node type instead of by
interface, but I am not sure what impact that approach would have on the 
clarity of the specification if
it were applied throughout the specification.

Ray Whitmer
rayw@netscape.com

Received on Thursday, 20 September 2001 14:27:54 UTC