Comments on XQuery 1.0 and XPath 2.0 Data Model, dated 7 June 2001

Another round of comments on this steadily-improving document:

1. The abstract and the first sentence of the introduction state (if you
follow the link) that this is the data model of XSLT 1.0; it isn't.

2. Section 3.2 states that in the document ordering, the namespace nodes of
an element follow the element but precede its attributes. This is
inconsistent with the idea, suggested but not spelled out in 4.4, that a
namespace node can be shared by several elements. In fact, the question of
namespace node identity is not really tackled. My view is that namespace
node identity should be determined by the combination of (document identity,
namespace prefix, namespace URI), that the parent of a namespace node should
be the document node, and that namespace nodes should be ordered after every
other node in the document. (This is easier for implementations than placing
them at the start of the document, because the number of namespace nodes is
not known until parsing is complete).

2a. Section 3.2: the second paragraph contains two sentences, the second one
starts "In other words". But the two sentences seem to be making quite
separate points, both of them valid.

2b. Section 3.2: does the concept of document order apply to nodes that are
not part of a document, ie. nodes that belong to a tree whose root is not a
document node? How can document order be stable in such cases, when the
constructor functions allow a node to be added to a tree as a separate
operation from creating the node?

3. Section 3.3 states that the data model does not support non-well-formed
documents, but section 4.1 states "the data model is more permissive: it
permits more than one element node as a child and also permits text nodes as
children".

4. In section 4, I think the note that attempts to explain the difference
between XPath 2.0 document nodes and XPath 1.0 root nodes is spurious. There
may turn out to be differences in usage, but at the level of the data model,
they are identical. A more important difference to highlight, and one that
jutisfies the change in terminology, is that XPath 2.0 trees may have a node
other than a document node as their root. (Though I question whether this is
actually a good idea...)

5. In section 4, it is stated that an attribute contains "a sequence of
simple-typed values", whereas an element may contain either a simple-typed
value or a sequence of simple-typed values. This appears to make a
distinction that doesn't actually exist: in both cases, a singleton is a
special case of a sequence.

6. In section 4, it is stated that an expanded QName contains a namespace
URI. It may contain no namespace URI. No accessor functions for obtaining
the two parts of an expanded QName are provided.

7. In section 4.2 Elements, the notion that the constructor makes a copy of
the supplied child nodes seems strange. It's hard to square this with the
definition of node identity. Also, I don't see why the provision is needed
here, but not for the document node constructor. Wouldn't it be cleaner to
define a precondition that all the child nodes supplied to the constructor
must be parentless?

8. In section 5.1, the notion that you can get from an ID or anyURI value to
an Element node seems to assume that the ID or anyURI primitive value
carries information about what document it came from. I'm not sure this is
realistic. Does it mean, for example, that the ID "X123" in one document is
not equal to the ID "X123" in a different document? Related to this, this
section uses the phrase "a document that is not contained in the data
model". This seems to imply some kind of closed-world (or "database")
assumption, namely that there exists some finite collection of documents
associated with the data model (and even that it's a containment
relationship, which means a document cannot be associated with two different
data models). But hang on, surely there is only one data model, the one
defined by this specification?

9. Section 5.2, on "derived simple values", contains statements which seem
to apply to all simple values, not only derived ones.

10. It would be useful if section 6 (Sequences) established terminology for
describing the members of the sequence. My preference would be "members".
There is also a need for a term that is generic over nodes and
simple-values; the document uses  "unit values" which doesn't seem very
nice. I'd suggest: "An item is a node or a simple-value. The items contained
in a sequence are referred to as the members of the sequence".

11. In section 6, head would appear to be a partial function, it does not
apply to empty sequences. If we follow the same conventions as elsewhere,
that means head returns a Sequence(0,1)<item>, which perhaps begs the
question as to how you extract the first member of this sequence...

12. If the string-value of a sequence is the concatenation of the
string-values of its members, then the string value of an empty sequence is
an empty string; which I like, but which violates the general rule that any
function applied to an empty sequence returns an empty sequence. The
consequences of this depend on how the query semantics make use of the
concept of a sequence having a string-value. It might be worth pointing this
out.

13. In section 8, the accessor "parent" returns a sequence of zero or one
SchemaComponents. But a SchemaComponent is not a node or a simple-value, so
it cannot appear in a sequence.

14. Section 9 states "we assume that equality over simple-values is
defined". This seems an optimistic assumption.

Keep up the good work!

Mike Kay

Received on Friday, 15 June 2001 11:01:31 UTC