- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 22 Apr 2010 20:14:14 -0600
- To: www-xpath-comments@w3.org
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
In separate mail [1] I have noted some opportunities for improving the definition of the data model in the XPath 1.0 spec. This note presents a set of concrete wording proposals to seize some of those opportunities. They are gathered together in groups according to purpose. [1] http://lists.w3.org/Archives/Public/www-xpath-comments/2010AprJun/0000.html (1) One change is a simple typo correction. In 5.7 Text Nodes, in para 2, for Thus, <![CDATA[<]]> in the source document will treated the same as <. insert "be" before 'treated", to read Thus, <![CDATA[<]]> in the source document will be treated the same as <. (2) Several changes are intended to create a clean separation between discussion of XML-to-datamodel mapping issues on the one hand, and on the other the definition of the data model in the abstract. They all involve changes to passages in the current text of the form "There is an X node for every X in the XML document." The problem is that sentences like There is an element node for every element in the document. have no useful specification-defined meaning. It may or may not have been a good decision, but the decision not to attempt an explicit definition of the nature of things like "document", "element", etc. was certainly a conscious decision on the part of at least some members of the responsible working group. The XML spec does not prescribe an answer to the question how many 'b' elements occur in a document which, after entity expansion, reads <a><b/><c><b/><b/></c></a> For this case, the answers 1, 2, and 3 are all compatible with the XML specification and with coherent positions on the nature of element-hood. The rest of the XPath 1.0 spec relies on there being three element nodes for 'b' elements in the data model instance corresponding to this document, so the definition of the data model needs to guarantee that result. At present, it does not. All of the change proposals below are thus careful to define the XML serial-form analogues of the XDM nodes as occurrences of strings in the XML character sequence, and not to appeal to the XML spec for non-existent rules concerning identity criteria for elements, processing instructions, and comments. (2a) In 5.2 Element Nodes, in the first paragraph, replace There is an element node for every element in the document. with When a tree is constructed from an XML document, the tree contains one element node for every occurrence in the document of any string of characters matching the "element" production of [XML]. (2b) In 5.5 Processing Instruction Nodes, replace the first paragraph There is a processing instruction node for every processing instruction, except for any processing instruction that occurs within the document type declaration. with When a tree is constructed from an XML document, the tree contains one processing instruction node for every occurrence in the document of any string of character matching the "PI" (processing instruction) production of [XML], except for those that occur within the document type declaration. (2c) In 5.6 Comment Nodes, for paragraph 1 There is a comment node for every comment, except for any comment that occurs within the document type declaration. substitute When a tree is constructed from an XML document, the tree contains one comment node for every occurrence in the document of any string of characters matching the "Comment" production of [XML], except for those occurring within the document type declaration. (3) A few changes specify explicit rules for data model instances which guarantee that they have legal XML serializations. (3a) In 5.1 Root Node, for the first paragraph The root node is the root of the tree. A root node does not occur except as the root of the tree. The element node for the document element is a child of the root node. The root node also has as children processing instruction and comment nodes for processing instructions and comments that occur in the prolog and after the end of the document element. substitute The root node is the root of the tree; alone among the nodes of the tree it has no parent. A root node does not occur except as the root of the tree. The element node for the document element is a child of the root node; it is the only element node among the root node's children. The root node also has as children processing instruction and comment nodes for processing instructions and comments that occur in the prolog and after the end of the document element. That is: insert "; alone among the nodes of the tree it has no parent" at the end of the first sentence, and "; it is the only element node among the root node's children" at the end of the third sentence. The first makes explicit a necessary property of the parent relation in XDM instances; the second guarantees that the serialization of the document node will match production [1] of the XML specification. (3b) In 5.3 Attribute Nodes, in paragraph 5 An attribute node has an expanded-name and a string-value. The expanded-name is computed by expanding the QName specified in the tag in the XML document in accordance with the XML Namespaces Recommendation [XML Names]. The namespace URI of the attribute's name will be null if the QName of the attribute does not have a prefix. insert at the end of the paragraph Any two distinct attribute nodes of the same parent element must have diffent expanded names. (3c) In 5.4 Namespace Nodes, in paragraph 2 A namespace node has an expanded-name: the local part is the namespace prefix (this is empty if the namespace node is for the default namespace); the namespace URI is always null. append Any two distinct namespace nodes of the same parent element must have diffent expanded names. Changes 3a and 3b together guarantee that the serialization of an element will not violate the WF constraint "Unique Att Spec". (4) One change is intended to set the stage rhetorically for the slightly more formal definition of the data model presented in change proposal 5 below. In section 5, change the first paragraph, which currently reads XPath operates on an XML document as a tree. This section describes how XPath models an XML document as a tree. This model is conceptual only and does not mandate any particular implementation. The relationship of this model to the XML Information Set [XML Infoset] is described in [B XML Information Set Mapping]. to read XPath operates on an XML document as a tree. This section describes how XPath models an XML document as a tree ^by defining a set of constraints on the nodes of the tree and on the relations holding between nodes^. This model is conceptual only and does not mandate any particular implementation. The relationship of this model to the XML Information Set [XML Infoset] is described in [B XML Information Set Mapping]. The ^...^ mark the only change, an insertion. (5) Finally, one larger change is intended to make the definition of the data model be wholly independent of the XML specification, and to ensure that the definition of the data model guarantees that all instances of the data model have the properties relied upon elsewhere in the XPath 1.0 spec. In section 5, delete the two paragraphs There is an ordering, document order, defined on all the nodes in the document corresponding to the order in which the first character of the XML representation of each node occurs in the XML representation of the document after expansion of general entities. Thus, the root node will be the first node. Element nodes occur before their children. Thus, document order orders element nodes in order of the occurrence of their start-tag in the XML (after expansion of entities). The attribute nodes and namespace nodes of an element occur before the children of the element. The namespace nodes are defined to occur before the attribute nodes. The relative order of namespace nodes is implementation-dependent. The relative order of attribute nodes is implementation-dependent. Reverse document order is the reverse of document order. Root nodes and element nodes have an ordered list of child nodes. Nodes never share children: if one node is not the same node as another node, then none of the children of the one node will be the same node as any of the children of another node. Every node other than the root node has exactly one parent, which is either an element node or the root node. A root node or an element node is the parent of each of its child nodes. The descendants of a node are the children of the node and the descendants of the children of the node. and insert the following: Each tree consists of a set of nodes and two binary relations on those nodes, named parent and next-sibling, which satisfy the following constraints: - The parent relation is a function from nodes to element nodes or root nodes: that is, each node has at most one parent, which is either an element node or the root node. The parent of an attribute node or a namespace node is an element node (not a root node). - The next-sibling relation is likewise a function, from (and to) nodes other than attribute nodes and namespace nodes. Each node which is neither an attribute node nor a namespace node has at most one next sibling, which is also neither an attribute node nor a namespace node. If (and only if) any two nodes are related by the next-sibling relation (that is, if it is possible to start at one node and reach the other by traversing the next-sibling relation one or more times), then the two nodes are *siblings*. - The two relations are acyclic: it is never possible, by following the parent or next-sibling relation repeatedly from node to node, to return to a node already visited. - Each tree contains exactly one root node. - Each node other than the root node has a parent; the root node has none. Note: it follows that from any node in the tree it is possible to reach the root by traversing the parent relation repeatedly (zero or more times). - If any two nodes are siblings, then those two nodes have the same parent. - Conversely, if any two nodes other than attribute or namespace nodes have the same parent, then they are siblings. Several other terms and relations on the nodes of a tree can be defined in terms of the parent and next-sibling relations. The attribute nodes and namespace nodes which have an element node as their parent are called the attribute nodes, or the namespace nodes, of that element node. All other nodes which have a node as their parent are called the children of that node. Formally, these facts can be represented by relations called attributes-of, namespace-nodes-of, and child. The union of the attributes-of, namespace-nodes-of, and child relations is the inverse of the parent relation. The positive transitive closure of the parent relation is the ancestor relation. The positive transitive closure of the child relation is the descendant relation. That is, for any nodes A and B, if the pair (A -> B) is a member of the ancestor relation, then B is an ancestor of A. Similarly, if the descendant includes (A -> B), then B is a descendant of A. The inverse of the next-sibling relation is the previous-sibling relation. The positive transitive closures of the next-sibling and previous-sibling relations are the following-sibling and preceding-sibling relations, respectively. Note: it follows from the definitions given that the next-sibling relation defines a total order on the children of any node. A total order, called *document order*, is defined on all nodes of the tree, as described below. It is convenient to write A << B for two nodes A and B, if A precedes B in document order, or A >> B if B precedes B in document order. 1 Parents precede their namespace nodes, attributes, and children in document order. 2 For any two nodes A and B, where B is the next-sibling of A, A itself, every descendant of A, and every attribute node or namespace node of A or of any descendant of A precedes B in document order. 3 The attribute nodes and namespace nodes of an element precede the children of that element in document order. The namespace nodes of an element precede the attribute nodes of that element in document order; otherwise, the relative order of the attribute nodes and namespace nodes of a given element is implementation-dependent. Reverse document order is the inverse of document order. Note: from the definition of document order as a total order, it follows that document order is: transitive: for any nodes A, B, and C, if A << B and B << C, then A << C irreflexive: for no node A is it true that A << A. antisymmetric: for any nodes A and B, if A << B then it is not true that B << A. complete: for any two nodes A and B, either A << B or B << A. Any tree defined by the relations thus described has several properties which may be usefully mentioned here: - The root node is the first node in document order and precedes all other nodes. - The relative order of nodes other than attribute nodes and namespace nodes corresponds to the relative order in which the first character of the XML representation of each node occurs in the XML representation of the document. Note: This assumes a one-to-one correspondence between the nodes of the tree and the individual occurrences, in an XML document in which all general entity references have been expanded, of strings matching the corresponding grammatical rules of [XML] -- one element node for each occurrence of a string matching the 'element' production, and vice versa, etc. - Nodes never share children, attributes, or namespace nodes. For any nodes A, B, and C, if A is both a child of B and a child of C, then B and C are the same node. And similarly for attribute nodes and namespace nodes. - No node is its own ancestor, descendant, or sibling. - The root node has no siblings. Some additional constraints on instances of the data model are given in the following sub-sections. -- **************************************************************** * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net ****************************************************************
Received on Friday, 23 April 2010 02:14:47 UTC