RE: XPTr bare names and XPATH id() from John Boyer on 2000-06-08 (w3c-ietf-xmldsig@w3.org from April to June 2000)

From: John Boyer <jboyer@PureEdge.com>
Date: Thu, 8 Jun 2000 10:13:52 -0700
To: "Henry S. Thompson" <ht@cogsci.ed.ac.uk>
Cc: "Joseph M. Reagle Jr." <reagle@w3.org>, "Dan Connolly" <connolly@w3.org>, "Daniel Veillard" <veillard@w3.org>, <w3t-tech@w3.org>, <w3c-ietf-xmldsig@w3.org>
Message-ID: <BFEDKCINEPLBDLODCODKGEDACDAA.jboyer@PureEdge.com>
Hi Henry,

Actually, there isn't much doom and gloom going on, and we seem to be doing
better than being close to a resolution.  We already have a number of
resolutions.  All of this began very simply when I pointed out a necessary
correction to the dsig spec because there was a difference between what the
xpath expression id("E") returns versus what our spec claimed would be the
serialized result.

Most of what you wrote below explains the basic notions of XPath, which I
have been actively working with for almost a year now.  There is no
confusion on my part regarding these basic issues, and I have indeed written
much of the same material in recent emails.

However, I would disagree on one minor point: the definition of string-value
does not provide a definition of subtree that includes namespace and
attribute nodes.  Please look again at the definition of string value,
especially the string-value description for element nodes [1], which states
that the string value of an element is the concatenation of the string
values of its *text node descendants*.  In other words, no visitation of
namespace and attribute nodes is performed by the string-value DFS
algorithm.  This entire thread would have been very small if namespace and
attribute nodes were included in descendancy because we could've just
serialized the node-set result of

id("E")/descendant-or-self::node()

Despite this minor point, I am glad that you (and Daniel Veillard [2]) agree
that applications are free to use node-sets in any way that is convenient,
since I believe it makes the most sense for the purpose of serializing or
canonicalizing XML to treat a node-set as a set of nodes, where inclusion in
the set means inclusion in the textual output, and exclusion from the set
means exclusion from the textual output.  The major source of our prior
disagreement seems to have been the assertion that a node-set should always
indicate subtree roots, which was not justified by subsequent comments that
Xpointers are pointers taht indicate what is accessible (since everything is
accessible from any point in the tree).

Furthermore, as I pointed out to the dsig group yesterday, and with which
you indicate agreement below, we are free to define the XPointer URI="#E" as
indicating E plus all nodes that have E as an ancestor.  In other words,
subtree(id("E")).

This segues nicely to my final question.  As I pointed out in prior emails,
the actual XPath expression that corresponds to the above definition of
URI="#E" is somewhat clumsy, but it is possible to say what we mean to say
in an XPath expression.  This is why I claim that we do not, in fact, have a
doom-and-gloom problem. However, you made the statement that defining a
subtree() function would "be counterproductive, as that function flattens
the tree and loses information in doing so."  Notwithstanding Dan Veillard's
comments to the contrary [2], I do not think this is the case.  When we
create a node-set containing the entire subtree, the document order of the
nodes retains the information necessary for us to create a canonical form or
a serialization that can be passed to a digest algorithm (where document
order for namespace and attribute nodes is defined by sorting [3, 4]).
Could you please explain if this does not adequately address your concern
about a flattened tree?

In conclusion, then, I really do not think we disagree on much if anything
now that we have both familiarized ourselves with the other's position.

[1] http://www.w3.org/TR/xpath#element-nodes
[2]
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000AprJun/0231.html
[3] http://www.w3.org/TR/xml-c14n
[4] http://www.w3.org/TR/xmldsig-core/#sec-XPath

John Boyer
Software Development Manager
PureEdge Solutions Inc. (formerly UWI.Com)
Creating Binding E-Commerce
jboyer@PureEdge.com


-----Original Message-----
From: Henry S. Thompson [mailto:ht@cogsci.ed.ac.uk]
Sent: Thursday, June 08, 2000 2:57 AM
To: John Boyer
Cc: Joseph M. Reagle Jr.; Dan Connolly; Daniel Veillard;
w3t-tech@w3.org; w3c-ietf-xmldsig@w3.org
Subject: Re: XPTr bare names and XPATH id()


"John Boyer" <jboyer@PureEdge.com> writes:

Let's back up and try again -- I feel like I've arrived late to a
party where everyone is already deep in a doom-and-gloom-laden
discussion about the awful consequences of something which hasn't
actually happened.  I think we're not actually very far from a
resolution, with your messages on another thread pointing the way, if
we reconceptualise a bit.

1) Nodesets.  Nodesets are introduced into the XPath data model to
represent the semantics of XPath expressions.  The intrinsic semantics
of a node set is a _disjunction_.   The interpretation of a nodeset as
the value of an XPath expression is:  "Each of the members of this set
satisfies the expression".

This is _really_ important: the value of the XPath expression '//foo'
is the set of all element nodes in a document whose name is 'foo' --
each of them, independently, is an answer to the instruction "Find me
something be starting from the root and looking for a 'foo' element".
Similarly, the value of the XPath expression 'id(baz)' is the set of
all element nodes in a document whose declared-as-type-id attribute
has the value 'foo'.  Now XML validity constraints (which must be in
play or we wouldn't know which attributes are of type id) require that
at most one such element exists, so the nodeset is either empty or
singleton.

It would be seriously incompatible with this fundamental aspect of
XPath to include in result nodesets nodes which did _not_ satisfy the
expression.

2) Nodes as results.  XPath expressions have values, which are
nodesets.  XPointer uses XPath expressions to pick out one or more
points in XML documents.  This is made clear as early as the
introduction [1], which uses words such as 'locate', 'address',
'choice', 'reference', 'target', etc.  It is entirely up to
applications what semantics they attach to such locations: "The
structures located with XPointer can be used as link targets or for
any other application-specific purpose".

In particular, it is clearly open to applications to specify that the
location picked out by an XPointer, whether short-frag, explicit-id or
other XPointer, is used by DSig as the root of a document subtree.
XML Schema, XInclude and XPath itself take this approach, so there's
plenty of precedent.

3) What does an XPointer point to?  The XPath REC introduces a data
model to answer this question [2].  It is clear from this exposition,
in particular from the definition of 'string-value', that the document
subtree rooted in a particular node is well-defined.

Net: XPointer is just fine for your purposes, all that is required is
to state that the DSig interpretation of an XPointer is as picking out
the entire document subtree rooted at the node identified by that
pointer.  You do _not_ need to do this by defining a further XPath
function such as your suggested 'subtree' -- indeed to do so what be
counterproductive, as that function flattens the tree and loses
information in doing so.

Hope this helps -- if not I suspect a 'phone call is the next step.

ht

[1] http://www.w3.org/TR/xptr#N598
[2] http://www.w3.org/TR/xpath#data-model
--
  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
          W3C Fellow 1999--2001, part-time member of W3C Team
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
	    Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
		     URL: http://www.ltg.ed.ac.uk/~ht/
Received on Thursday, 8 June 2000 13:14:11 UTC