- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 22 Apr 2010 20:10:39 -0600
- To: www-xpath-comments@w3.org
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
The definition of the XPath 1.0 data model in section 5 of the specification at http://www.w3.org/TR/xpath/#data-model http://www.w3.org/TR/1999/REC-xpath-19991116/#data-model seems to me to offer some small opportunities for improvement. This mail identifies some of them. Tools now available make it somewhat easier today than it was in 1999 to check the logical consequences of sets of definitions and axioms, and in applying one of those tools to the XPath 1.0 data model it becomes clear that the current definition of the data model in section 5 has some gaps in places where it would probably be better not to have gaps. (1) It would be desirable, I think, for the definition of the data model to be formulated in terms of nodes and their relations, without reference to the XML spec. It should be easy to see that every instance of the data model corresponds to an XML document, and every XML document to an instance of the data model, but the two should be defined independently of each other. Large parts of the current text appear to aspire to this goal of defining the data model independently of the XML spec, but the goal is not fully achieved. (2) The current text refers to properties of XML serializations in describing document order, e.g. in There is an ordering, document order, defined on all the nodes in the document corresponding to the order in which the first character of the XML representation of each node occurs in the XML representation of the document after expansion of general entities. The reference may be taken in either of two ways: (a) as a non-normative observation about the properties of document order as defined by normative rules elsewhere, or it can be taken (b) as a normative appeal to properties of the XML serialization. Neither seems a wholly satisfactory reading. In the former case (a), non-normative observation is false: the properties in question are not in fact guaranteed by the normative rules elsewhere in section 5. In the latter case (b), the normative appeal to XML undercuts the independence of the definition of the data model, and it also requires a clear mapping from data model constructs to parse trees for XML documents. The current text of the spec does not provide such an explicit mapping. In case (b), also, some rules explicitly stated in section 5 would become redundant. (If document order is normatively defined as ordering elements in the order of their start-tags, for example, then it is unnecessary to say, as the same paragraph does, that parents precede their children.) Whatever the originally intended rhetorical function of the sentence quoted, I think it would be better to distinguish clearly between essential normative statements and informative observations about the logical consequences of those normative statements, and either to define document order for data model instances completely without appeal to the properties of an XML serialization (or source), with a clearly non-normative statement about the relation between document order and serial-XML order, or else to define document entirely in terms of XML order, and make clear that remarks about ancestor nodes preceding descendant nodes are not themselves normative but are informative statements describing some individual consequences of the normative definition. (I would favor the former, not the latter, approach, since it makes clearer that XSLT can work on any instance of the data model, not only on those generated by parsing an XML character stream.) (3) The rest of the XPath 1.0 spec relies on the proposition that document order is a total order, not a partial order, on the nodes of the document. It would be helpful if that proposition followed logically from the definition of the data model; in the current text it does not. (4) This reader's intuitive understanding of the XPath 1.0 spec as a whole (and, I believe, most readers' intuitive understandings) is that the "ordered list of child nodes" possessed by the root node and by element nodes is related to document order in that for any elements E and F, if E precedes F in some ordered list of child nodes, then E precedes F in document order. It would be helpful if this relation were specified in the definition of the data model. (5) The expected behavior of the axes (as I understand them) relies on the proposition that no node's ordered list of children contains duplicates. Several equivalent formulations of this rule are possible: Each node has at most one immediately following sibling and at most one immediately preceding sibling. The binary node -> node relations underlying the following-sibling and preceding-sibling axes are acyclic.) No node is its own sibling. The number of a parent's children is equal to the length of that parent's ordered list of child nodes. It would be good, I think, if this proposition were clearly entailed by the definition of the data model. In the current state of the spec, this is not the case. Some readers point to section 5.2 and the sentence "There is an element node for every element in the document" as entailing the proposition stated above, but either this is an appeal to the XML specification or it is not. If it is not, then it appears to be a circular argument. If it is, then it is ineffective because the XML specification does not define 'element' in a way that allows one to say with certainty whether different occurrences of the same sequence of character types count as different elements or as different occurrences of the same element. (6) Sentences of the form "There is one node of type N for every N construct" appear not only where N is "element" but also for other constructs (e.g. processing instructions and comments). I take these statements as a partial description of a mapping from XML documents or information sets to XPath 1.0 data model instances. It would be desirable, I think, to separate discussion of XML-to-datamodel mapping from definition of the data model in the abstract. Also, as noted above, these statements appeal to the XML spec for a conception of identity of elements, processing instructions, comments, etc. which the XML spec does not in fact provide. I believe the principle underlying these statements is, roughly, that for element nodes, PI nodes, and comment nodes, there should be one node of appropriate type in the data model instance for each occurrence in the XML document of any string of characters matching the corresponding production in the XML grammar. That principle can, and probably should, be stated without having to assume some particular view on whether elements, processing instructions, and comments are by nature sequences of character types, sequences of character tokens, occurrences of sequences of character types, or something else. (7) The parent relation is I think generally thought to be the inverse of the union of the child, attribute, and namespace relations; it would be good, I think, if the definition of the data model said this explicitly. (8) Some statements in section 5 which appear to be normative are redundant: they are consequences of other normative statements. In some cases, of course, the converse is also true. For example, the proposition that nodes never share children follows logically from the proposition that every node other than the root has exactly one parent. Similarly for the proposition that elements never share attributes (or namespace nodes). It would be slightly better, perhaps, if it were feasible to make a clear distinction between normative statements and non-normative mentions of the logical consequences of the normative statements. (9) In the spirit of keeping data model instances close to XML and ensuring that each legal instance is serializable in XML, it would probably be a good idea to specify in the definition of the data model that no two attributes on the same element share an expanded name. (10) In a tree, the parent relation is acyclic. It would probably be a good idea if the definition of the data model said explicitly that the parent relation in the data model is acyclic. (It may seem to follow from the analogy to human family relations that parenthood and ancestry are natrually acyclic, but some readers, at least, of the XPath specification will be familiar with the humorous song "I'm my own grandpa", which exhibits a counter-example to that general rule.) Interested or skeptical readers may wish to consult the following for further discussion; the first few are postings in my blog and the last two are Alloy models formalizing various aspects of the XPath 1.0 data model. "An XPath 1.0 Puzzle" http://cmsmcq.com/mib/?p=947 "Tell me, Captain Yossarian, how many elements do you see?" http://cmsmcq.com/mib/?p=955 "Two, four, three, who~s counting?" http://cmsmcq.com/mib/?p=966 "How formal can you get?" http://cmsmcq.com/mib/?p=980 http://www.blackmesatech.com/2010/01/xpath10.als http://www.blackmesatech.com/2010/03/otrees.als None of the opportunities for improvement mentioned above is particularly difficult to seize and exploit. In separate email I will propose some specific changes to the text of XPath 1.0 which illustrate that the changes needed are mostly rather minor. -- **************************************************************** * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net ****************************************************************
Received on Friday, 23 April 2010 02:11:11 UTC