Comments on the XPath data model, from a DOM perspective. from Ray Whitmer on 2002-04-01 (www-xml-query-comments@w3.org from April 2002)

From: Ray Whitmer <rayw@netscape.com>
Date: Mon, 01 Apr 2002 13:26:26 -0800
To: www-xml-query-comments@w3.org
Message-ID: <3CA8D082.4070108@netscape.com>
This was just a quick scan, again.  I have not looked at XPath 2.0 itself,
but only the data model.

* It seems clear that the XPath 2.0 specification has no type comparable to
the node set or other built-in types of XPath 1.0.  The concept of a 
typeless
sequence does not seem to work as effectively.  In many languages, arrays of
objects are typed.  Although some people use untyped languages, those who
rely on a certain level of typing are likely to complain when they lose 
that,
as is being lost in this case.  There is certain distress in worrying that
your array of matching nodes might have strings interspersed in it, and
applications which in XPath 1.0 relied on receiving sets only containing
nodes are not going to be able to deal compatibly with a model which no
longer is able to return that type of guarantee.

* XPath 1.0 was based on explicitly unordered sets of nodes that could be
accessed in order.  XPath 2.0 claims that every sequence is ordered, but
there is not sufficient discussion of what that means, which has caused
significant confusion.  The logical conclusion could be drawn that it is
referring to document order, which is the only order it seems to define
and was the order of XPath 1.0, but this makes no sense when considering
non-node items now possible in the result sets.  Also, the incompatible
treatment of duplicates is confusing, if the sets are now ordered, rather
than unordered, it seems pointless to not eliminate the duplicates, but
there is probably something lost between the different versions of the
specification.

Based upon recent discussions, it seems that the XPath 2.0 specification
may not be comparable or compatible with the XPath 1.0 specification in its
use of these terms, but the specification needs better treatment of the
concepts, and explanation of the impact on backwards compatibility.
Elimination of duplicates also seems like a significant compatibility
problem since 1.0 implementations went to great lengths to accomplish
this.

* The copy semantics of node constructors seems wrong even if it was the
only way to model the lisp semantics that the authors of XPath 2.0 seem
to be using throughout the specification.  It would seem that a constructed
node should not lose its identity when inserted into a hierarchy, but
XPath 2.0 seems to mandate that.

* section 4.1, collapse-text-node: what is the parent of the text node
resulting from the collapse operation?  What if the nodes of the operation
have parents of different elements, or different documents?  The example
given using sequence-map claims to construct a new sequence of children
nodes.  Children of what?  When it "collapses nodes", does this mutate the
original node?  If not, then a complete parallel hierarchy is required to
accomodate this new node, because it cannot become a child of any existing
node, nor can its ancestors.  In any case the wording of the specification
is internally inconsistent in describing this function.

* "Descendant nodes" is used but not defined.  Due to the confused use of
parent relationships of XPath contradicting infoset and other models such
as DOM, this is important and it can be unclear whether it includes
attributes, namespaces, etc. where it is used.

* That there should be document order between documents seems strange.
This makes the ordering of namesace nodes all-the-more bizarre because
they belong to no document and presumably may be shared between documents,
so coming at the start of a document or (I can't say I follow the logic
in this one) ordering after every other node in the document both seem
impossible and broken.  In every other case, there is some relationship
between objects being ordered (excluding, again, namespaces, which seem
to be global between documents now).  Requiring document order between
documents to be stable requires much better document identification than
we have today, because if a document is persisted and brought back into
memory, which can happen at any time during processing, you need to
be able to go back to something to reestablish the sort in the same way.

* The model claims: "The data model does not support XML documents that are
not supported by the XML Information Set, for example, non-well-formed
documents and documents that don't conform to XML Namespaces."  But the
constructors seem perfectly able to construct objects which are not well-
formed, for example, by putting "--" into the text of a comment node or
other illegal characters generally anywhere.

* The model appears to make it possible to construct text nodes that have
empty strings, elements with multiple ajacent text nodes, and other non-
normalized result trees.  There needs to be a section on what happens in
those cases, since the XPath is inventing its own propgramming model here
that is different from infoset and all other models such as DOM.

* The model appears to make it possible to construct hierarchies which are
not namespace-well-formed, but makes no mention of how processing will
occur in those cases.  At the very least, an attribute fragment is not
namespace-well-formed if it uses namespaces.  And the whole concept of how
to construct elements properly with namespace nodes seems quite muddy,
because it would seem to require complete knowledge of all of the
ancestors to specify a list of namespaces that is consistent with all of
its ancestors, since it would seem to be an error to ever pass a child to
the constructor of a parent that does not already contain all the namespace
nodes of the parent, since XML has no ability to undefine namespaces and
this would represent an impossible infoset.  But the spec seems quite
silent on this issue.  It would seem like the ancestor should be created
first, not the child as the current API dictates, or convenience methods are
required to correctly construct the hierarchy, because this problem will
arise whenever an element is constructed as a child of an element.

* In general, it is not clear what is constructed if the constructors are
called in such a way as to produce non-well-formed results or results that
cannot be expressed as XML.

* The namespace node issues of ownership, order, identity, and backwards
compatibility, have not been resolved, nor has a complete solution been
proposed.

* The list was longer, but I had a number that duplicate your existing 
issues.
It is hard enough to get a good feeling for what the model looks like 
without
getting lots of these resolved.  I might suggest that you thoroughly study
the DOM specification and you will find many more border cases you have
missed.  Construction of a hierarchy using an API is the same problem that
DOM solves.

Certainly more to come when we can get some of the basics satisfactorily
resolved.

Ray Whitmer
rayw@netscape.com
Received on Monday, 1 April 2002 16:26:12 UTC