- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 22 Apr 2010 20:14:14 -0600
- To: www-xpath-comments@w3.org
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
In separate mail [1] I have noted some opportunities for improving the
definition of the data model in the XPath 1.0 spec. This note
presents a set of concrete wording proposals to seize some of those
opportunities. They are gathered together in groups according to
purpose.
[1] http://lists.w3.org/Archives/Public/www-xpath-comments/2010AprJun/0000.html
(1) One change is a simple typo correction.
In 5.7 Text Nodes, in para 2, for
Thus, <![CDATA[<]]> in the source document will treated the same
as <.
insert "be" before 'treated", to read
Thus, <![CDATA[<]]> in the source document will be treated the
same as <.
(2) Several changes are intended to create a clean separation between
discussion of XML-to-datamodel mapping issues on the one hand, and on
the other the definition of the data model in the abstract. They all
involve changes to passages in the current text of the form "There is
an X node for every X in the XML document."
The problem is that sentences like
There is an element node for every element in the document.
have no useful specification-defined meaning. It may or may not have
been a good decision, but the decision not to attempt an explicit
definition of the nature of things like "document", "element",
etc. was certainly a conscious decision on the part of at least some
members of the responsible working group. The XML spec does not
prescribe an answer to the question how many 'b' elements occur in a
document which, after entity expansion, reads
<a><b/><c><b/><b/></c></a>
For this case, the answers 1, 2, and 3 are all compatible with the XML
specification and with coherent positions on the nature of
element-hood. The rest of the XPath 1.0 spec relies on there being
three element nodes for 'b' elements in the data model instance
corresponding to this document, so the definition of the data model
needs to guarantee that result. At present, it does not.
All of the change proposals below are thus careful to define the XML
serial-form analogues of the XDM nodes as occurrences of strings in
the XML character sequence, and not to appeal to the XML spec for
non-existent rules concerning identity criteria for elements,
processing instructions, and comments.
(2a) In 5.2 Element Nodes, in the first paragraph, replace
There is an element node for every element in the document.
with
When a tree is constructed from an XML document, the tree contains
one element node for every occurrence in the document of any
string of characters matching the "element" production of [XML].
(2b) In 5.5 Processing Instruction Nodes, replace the first paragraph
There is a processing instruction node for every processing
instruction, except for any processing instruction that occurs
within the document type declaration.
with
When a tree is constructed from an XML document, the tree contains
one processing instruction node for every occurrence in the
document of any string of character matching the "PI" (processing
instruction) production of [XML], except for those that occur
within the document type declaration.
(2c) In 5.6 Comment Nodes, for paragraph 1
There is a comment node for every comment, except for any comment
that occurs within the document type declaration.
substitute
When a tree is constructed from an XML document, the tree contains
one comment node for every occurrence in the document of any
string of characters matching the "Comment" production of [XML],
except for those occurring within the document type declaration.
(3) A few changes specify explicit rules for data model instances
which guarantee that they have legal XML serializations.
(3a) In 5.1 Root Node, for the first paragraph
The root node is the root of the tree. A root node does not occur
except as the root of the tree. The element node for the document
element is a child of the root node. The root node also has as
children processing instruction and comment nodes for processing
instructions and comments that occur in the prolog and after the
end of the document element.
substitute
The root node is the root of the tree; alone among the nodes of
the tree it has no parent. A root node does not occur except as
the root of the tree. The element node for the document element is
a child of the root node; it is the only element node among the
root node's children. The root node also has as children
processing instruction and comment nodes for processing
instructions and comments that occur in the prolog and after the
end of the document element.
That is: insert "; alone among the nodes of the tree it has no parent"
at the end of the first sentence, and "; it is the only element node
among the root node's children" at the end of the third sentence. The
first makes explicit a necessary property of the parent relation in
XDM instances; the second guarantees that the serialization of the
document node will match production [1] of the XML specification.
(3b) In 5.3 Attribute Nodes, in paragraph 5
An attribute node has an expanded-name and a string-value. The
expanded-name is computed by expanding the QName specified in the
tag in the XML document in accordance with the XML Namespaces
Recommendation [XML Names]. The namespace URI of the attribute's
name will be null if the QName of the attribute does not have a
prefix.
insert at the end of the paragraph
Any two distinct attribute nodes of the same parent element must
have diffent expanded names.
(3c) In 5.4 Namespace Nodes, in paragraph 2
A namespace node has an expanded-name: the local part is the
namespace prefix (this is empty if the namespace node is for the
default namespace); the namespace URI is always null.
append
Any two distinct namespace nodes of the same parent element must
have diffent expanded names.
Changes 3a and 3b together guarantee that the serialization of an
element will not violate the WF constraint "Unique Att Spec".
(4) One change is intended to set the stage rhetorically for the
slightly more formal definition of the data model presented in change
proposal 5 below.
In section 5, change the first paragraph, which currently reads
XPath operates on an XML document as a tree. This section
describes how XPath models an XML document as a tree. This model
is conceptual only and does not mandate any particular
implementation. The relationship of this model to the XML
Information Set [XML Infoset] is described in [B XML Information
Set Mapping].
to read
XPath operates on an XML document as a tree. This section
describes how XPath models an XML document as a tree ^by defining
a set of constraints on the nodes of the tree and on the relations
holding between nodes^. This model is conceptual only and does not
mandate any particular implementation. The relationship of this
model to the XML Information Set [XML Infoset] is described in [B
XML Information Set Mapping].
The ^...^ mark the only change, an insertion.
(5) Finally, one larger change is intended to make the definition of
the data model be wholly independent of the XML specification, and to
ensure that the definition of the data model guarantees that all
instances of the data model have the properties relied upon elsewhere
in the XPath 1.0 spec.
In section 5, delete the two paragraphs
There is an ordering, document order, defined on all the nodes in
the document corresponding to the order in which the first
character of the XML representation of each node occurs in the XML
representation of the document after expansion of general
entities. Thus, the root node will be the first node. Element
nodes occur before their children. Thus, document order orders
element nodes in order of the occurrence of their start-tag in the
XML (after expansion of entities). The attribute nodes and
namespace nodes of an element occur before the children of the
element. The namespace nodes are defined to occur before the
attribute nodes. The relative order of namespace nodes is
implementation-dependent. The relative order of attribute nodes is
implementation-dependent. Reverse document order is the reverse of
document order.
Root nodes and element nodes have an ordered list of child
nodes. Nodes never share children: if one node is not the same
node as another node, then none of the children of the one node
will be the same node as any of the children of another
node. Every node other than the root node has exactly one parent,
which is either an element node or the root node. A root node or
an element node is the parent of each of its child nodes. The
descendants of a node are the children of the node and the
descendants of the children of the node.
and insert the following:
Each tree consists of a set of nodes and two binary relations on
those nodes, named parent and next-sibling, which satisfy the
following constraints:
- The parent relation is a function from nodes to element
nodes or root nodes: that is, each node has at most one
parent, which is either an element node or the root node.
The parent of an attribute node or a namespace node is an
element node (not a root node).
- The next-sibling relation is likewise a function, from (and
to) nodes other than attribute nodes and namespace nodes.
Each node which is neither an attribute node nor a namespace
node has at most one next sibling, which is also neither an
attribute node nor a namespace node.
If (and only if) any two nodes are related by the next-sibling
relation (that is, if it is possible to start at one node and
reach the other by traversing the next-sibling relation one or
more times), then the two nodes are *siblings*.
- The two relations are acyclic: it is never possible, by
following the parent or next-sibling relation repeatedly from
node to node, to return to a node already visited.
- Each tree contains exactly one root node.
- Each node other than the root node has a parent; the root
node has none.
Note: it follows that from any node in the tree it is
possible to reach the root by traversing the parent
relation repeatedly (zero or more times).
- If any two nodes are siblings, then those two nodes have the
same parent.
- Conversely, if any two nodes other than attribute or
namespace nodes have the same parent, then they are siblings.
Several other terms and relations on the nodes of a tree can be
defined in terms of the parent and next-sibling relations.
The attribute nodes and namespace nodes which have an element node
as their parent are called the attribute nodes, or the namespace
nodes, of that element node. All other nodes which have a node as
their parent are called the children of that node. Formally,
these facts can be represented by relations called attributes-of,
namespace-nodes-of, and child. The union of the attributes-of,
namespace-nodes-of, and child relations is the inverse of the
parent relation.
The positive transitive closure of the parent relation is the
ancestor relation. The positive transitive closure of the child
relation is the descendant relation. That is, for any nodes A and
B, if the pair (A -> B) is a member of the ancestor relation,
then B
is an ancestor of A. Similarly, if the descendant includes
(A -> B), then B is a descendant of A.
The inverse of the next-sibling relation is the previous-sibling
relation. The positive transitive closures of the next-sibling
and previous-sibling relations are the following-sibling and
preceding-sibling relations, respectively.
Note: it follows from the definitions given that the
next-sibling relation defines a total order on the children of
any node.
A total order, called *document order*, is defined on all nodes of
the tree, as described below. It is convenient to write A << B
for two nodes A and B, if A precedes B in document order, or
A >> B if B precedes B in document order.
1 Parents precede their namespace nodes, attributes, and
children in document order.
2 For any two nodes A and B, where B is the next-sibling of A, A
itself, every descendant of A, and every attribute node or
namespace node of A or of any descendant of A precedes B in
document order.
3 The attribute nodes and namespace nodes of an element precede
the children of that element in document order. The namespace
nodes of an element precede the attribute nodes of that element
in document order; otherwise, the relative order of the
attribute nodes and namespace nodes of a given element is
implementation-dependent.
Reverse document order is the inverse of document order.
Note: from the definition of document order as a total order, it
follows that document order is:
transitive: for any nodes A, B, and C, if A << B and B << C,
then A << C
irreflexive: for no node A is it true that A << A.
antisymmetric: for any nodes A and B, if A << B then it is not
true that B << A.
complete: for any two nodes A and B, either A << B or B << A.
Any tree defined by the relations thus described has several
properties which may be usefully mentioned here:
- The root node is the first node in document order and
precedes all other nodes.
- The relative order of nodes other than attribute nodes and
namespace nodes corresponds to the relative order in which the
first character of the XML representation of each node occurs
in the XML representation of the document.
Note: This assumes a one-to-one correspondence between the
nodes of the tree and the individual occurrences, in an
XML document in which all general entity references have
been expanded, of strings matching the corresponding
grammatical rules of [XML] -- one element node for each
occurrence of a string matching the 'element' production,
and vice versa, etc.
- Nodes never share children, attributes, or namespace nodes.
For any nodes A, B, and C, if A is both a child of B and a
child of C, then B and C are the same node. And similarly for
attribute nodes and namespace nodes.
- No node is its own ancestor, descendant, or sibling.
- The root node has no siblings.
Some additional constraints on instances of the data model are
given in the following sub-sections.
--
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************
Received on Friday, 23 April 2010 02:14:47 UTC