Re: Question for implementors: XPath model and CDATA sections from Joseph Reagle on 2002-02-12 (w3c-ietf-xmldsig@w3.org from January to March 2002)

From: Joseph Reagle <reagle@w3.org>
Date: Tue, 12 Feb 2002 16:10:02 -0500
To: Philippe Le Hegaret <plh@w3.org>, Christian Geuer-Pollmann <geuer-pollmann@nue.et-inf.uni-siegen.de>
Cc: w3c-ietf-xmldsig@w3.org, Martin Duerst <duerst@w3.org>
Message-Id: <200202122110.QAB19812@tux.w3.org>

On Tuesday 12 February 2002 11:52, Philippe Le Hegaret wrote:
> and Xalan is right on that. They are following the DOM Level 3 Xpath
> working draft:
> [[
> The XPath model relies on the XML Information Set [XML Information set]
> ands represents Character Information Items in a single logical text
> node where DOM may have multiple fragmented Text nodes due to cdata
> sections, entity references, etc. Instead of returning multiple nodes
> where XPath sees a single logical text node, only the first non-empty
> DOM Text or CDATASection node of any logical XPath text will be returned
> in the node set.
> ]]
> http://www.w3.org/TR/2002/WD-DOM-Level-3-XPath-20020208/xpath.html#TextNo
>des

I'd tweak this text to remove the ambiguity that XPath relies on the XML 
Information Set. They are very similar, but there are differences and this 
is the source of our problem.

> You'll need to use the wholeText property, introduced in DOM Level 3
> Core to retrieve the text. Or, in the meantime, look for the text
> yourself.

Ok, to review the example that we put on the whiteboard in my office. Given 
the instance:

<foo>
  <a>it <![CDATA[is]]> sunny</a>
  <a>clouds</a>
</foo>

For the first element a:
  o In XPath there is one text node of "it is sunny"
  o In Infoset there are 11 character information items.
  o In DOM there is 1 text node of "it", 1 CDATA of "is", and 
    1 text node of "sunny"

Now the level-3 XPath text states, "Instead of returning multiple nodes 
where XPath sees a single logical text node, only the first non-empty DOM 
Text or CDATASection node of any logical XPath text will be returned in the 
node set." And to me, this doesn't seem very XPath'ish at all. However, you 
explained that this is done *to be* compatible with XPath with respect to 
ordering. For example 
  1 = a.getFirstChild()
  1.text() -> "it"
  1 = a.getNextSibling()
  1.text() -> "cloudy"  # We don't want to get "sunny here".

Ok, so I understand that now, and that after the first assignment to "1":
  1.wholetext() ->"it is sunny"
And lacking this, folks will have to manually suck up text and CDATA 
sections until they hit a boundary ...?

So perhaps in your text you could point out that your motivation of being 
"compatible with XPath" is not with respect to its datamodel, but with 
respect to ordering?

-- 

Joseph Reagle Jr.                 http://www.w3.org/People/Reagle/
W3C Policy Analyst                mailto:reagle@w3.org
IETF/W3C XML-Signature Co-Chair   http://www.w3.org/Signature/
W3C XML Encryption Chair          http://www.w3.org/Encryption/2001/

Received on Tuesday, 12 February 2002 16:10:08 UTC