- From: Paul Cotton <pcotton@microsoft.com>
- Date: Wed, 1 May 2002 04:21:46 -0400
- To: "Ray Whitmer" <rayw@netscape.com>
- Cc: <www-xml-query-comments@w3.org>, <mhkay@iclway.co.uk>
> I don't know how this posting wound up here I am sorry Ray but you originally posted your comments to this list not the xpath comments list. See [1]. >Or am I expected to subscribe to this comments list? I apologize that Michael Kay did not copy you on his reply. It is better form for responders to copy the original correspondent and I try to get responders from the XML Query WG to do this but this one slipped thru. /paulc [1] ] http://lists.w3.org/Archives/Public/www-xml-query-comments/2002Apr/0000. html Paul Cotton, Microsoft Canada 17 Eleanor Drive, Nepean, Ontario K2E 6A3 Tel: (613) 225-5445 Fax: (425) 936-7329 <mailto:pcotton@microsoft.com> > -----Original Message----- > From: Ray Whitmer [mailto:rayw@netscape.com] > Sent: Monday, April 29, 2002 4:43 PM > To: www-xml-query-comments@w3.org > Subject: Re: Comments on the XPath data model, from a DOM perspective. > > Sorry, I don't know how this posting wound up here, when I thought I > posted > to www-xpath-comments. Somehow I expected to be copied on a response. Or > am > I expected to subscribe to this comments list? > > >Sections marked ">" are from RayWhitmer: > > > >>* It seems clear that the XPath 2.0 specification has no type comparable > to > >>the node set or other built-in types of XPath 1.0. The concept of a > >>typeless sequence does not seem to work as effectively. In many > languages, > >arrays of > >>objects are typed. > > > >In the published December drafts, the type system is not very well > >developed. A lot of work has been done on this in the last few months, > some > >of which is visible in the recent Formal Semantics draft. It has always > been > >intended that XQuery should offer strong typing. In practice it will > usually > >be possible to detect statically that a sequence is of a particular type, > >e.g. a sequence of nodes or a sequence of integers, though arbitrary > >heteregeneous sequences are permitted as the most general case. > > > We are more worried about the common XPath 1.0 case of a set of nodes, > which appears to require an incompatible degradation of the API to support > XPath 2.0. It may be acceptable in Lisp to do this where there is no > typing > and which we get the idea must have heavily influenced XPath 2.0 because > of the choices it makes, but in other languages lists have types and are > not equally useful if the typing is disabled as is done for sequences in > XPath 2.0. > > This answer does not seem to answer the question. An API can claim to > never break anyone by just using the most abstract object type everywhere, > but that is simply not useful, which is why most programming languages > use types, and why a node list is more useful than a list. There are many > things, including ordering, that apply to nodes that do not apply to > untyped > objects. Just saying the new spec uses untyped everywhere does not solve > compatibility with the old. > > >>* XPath 1.0 was based on explicitly unordered sets of nodes that could > be > >>accessed in order. XPath 2.0 claims that every sequence is ordered, but > >>there is not sufficient discussion of what that means, which has caused > >>significant confusion. The logical conclusion could be drawn that it is > >>referring to document order, which is the only order it seems to define > >>and was the order of XPath 1.0, but this makes no sense when considering > >>non-node items now possible in the result sets. Also, the incompatible > >>treatment of duplicates is confusing, if the sets are now ordered, > rather > >>than unordered, it seems pointless to not eliminate the duplicates, but > >>there is probably something lost between the different versions of the > >>specification. > > > >Essentially, those expressions which in XPath 1.0 returned a "node-set" > have > >been redefined in XPath 2.0 to return an "ordered sequence of nodes in > >document order without duplicates". Since there is a one-to-one > >correspondence between unordered node-sets and ordered node-sequences in > >document order, compatibility is preserved. However, XPath 2.0 can also > >return sequences in an order other than document order (important when > the > >user of a Query wants to specify an application-oriented ordering of the > >results). > > > I thought that these, and all, return a sequence, not of nodes, but > untyped > objects. While the writer of the expression may believe that the return > only contains nodes, that does not help at all in a formal type system, > and > it confuses greatly the concept of ordering. > > This is not compatible at all, unless Lisp is your language and you always > disregarded types anyway. > > >Basically, a sequence can contain items in any order. The order of the > >result is determined by the semantics of the expression that created the > >sequence. Path expressions produce results in document order, but other > >expressions may produce results in a different order. > > > But in XPath 1.0 a node set could always be accessed in document order and > with guaranteed uniqueness of results. In XPath 2.0, document order makes > less sense, because your items may noteven be nodes. This seems to > require > different semantics than accessingthe items of a result in document order. > > >>Based upon recent discussions, it seems that the XPath 2.0 specification > >>may not be comparable or compatible with the XPath 1.0 specification in > its > >>use of these terms, but the specification needs better treatment of the > >>concepts, and explanation of the impact on backwards compatibility. > >>Elimination of duplicates also seems like a significant compatibility > >>problem since 1.0 implementations went to great lengths to accomplish > >>this. > > > >We think we have solved all the important backwards compatibility issues, > >but you are right that there is a significant change in terminology and > that > >we could do a lot more to explain the relationship between XPath 2.0 > terms > >and their XPath 1.0 equivalents. > > > And I am still looking for evidence of a solution to the compatibility > issues > that were raised at the beginning such as the ordering, typing, and > returns > which seem to be incompatible, except for Lisp programmers in some cases, > let alone all the compatibility issues with the extended Lisp DOM APIs > being > created by the XPath group. > > >>* The copy semantics of node constructors seems wrong even if it was the > >>only way to model the lisp semantics that the authors of XPath 2.0 seem > >>to be using throughout the specification. It would seem that a > constructed > >>node should not lose its identity when inserted into a hierarchy, but > >>XPath 2.0 seems to mandate that. > > > >In XSLT, we never make a node available for manipulation until it is > >inserted into its hierarchy, so this problem does not appear. It is > >potentially a problem for XQuery, where I think the semantics of element > >construction still require some further work. The reason it is specified > the > >way it is, I think, is to ensure that nodes are immutable: you can't have > >the parent() accessor on the same node giving different results at > different > >times. > > > Then how are the constructor arguments passed if there is no reference to > them? > > I think there is a reference to them before they are passed to the > constructor, > so the id of the copy will be different from the id of the passed object. > > >The model on namespace nodes is certainly broken in the current draft. We > >are still debating how best to fix it. We know that we want to relax the > >XPath 1.0 rules to allow namespace nodes to be shared between elements, > and > >we know this has inevitable side-effects on the parentage and ordering of > >namespace nodes. But we haven't yet decided exactly what the new rules > >should be. All the proposals currently on the table still have namespace > >nodes belonging exclusively to a single document. > > > If they belong to a document, then you will have to add an ownerDocument > attribute, which the infoset does not have, to allow that ordering and > identity > checking to occur. > > It is hard without resolution on the issue. > > >>Requiring document order between > >>documents to be stable requires much better document identification than > >>we have today, because if a document is persisted and brought back into > >>memory, which can happen at any time during processing, you need to > >>be able to go back to something to reestablish the sort in the same way. > > > >The stability of ordering across documents is only required within the > scope > >of a single query or transformation (though I don't know if we currently > say > >this very well). Given that document node identity must also be stable > >within this scope, I don't think it's difficult to devise implementation > >strategies that work, e.g. basing document order on the order of the > >internal identifiers of the document nodes. > > > If you make the requirement of adding internal identifiers to the DOM > implementation. In a Java implementation, for example, there is no id > available that is guaranteed to be unique for any object. > > And while you may be able to wave away the issue of lifetimes, those > working with a model such as DOM may not be able to. > > >>* The model claims: "The data model does not support XML documents that > are > >>not supported by the XML Information Set, for example, non-well-formed > >>documents and documents that don't conform to XML Namespaces." But the > >>constructors seem perfectly able to construct objects which are not > well- > >>formed, for example, by putting "--" into the text of a comment node or > >>other illegal characters generally anywhere. > > > >I suspect you are right: there are probably quite a few error conditions > >that still need to be documented. The intention is to disallow operations > >that create an inconsistent structure, e.g. multiple attributes with the > >same name. > > > But what if these conditions do not match between DOM and XPath and you > then > try to build XPath on top of DOM? > > >>* The model appears to make it possible to construct text nodes that > have > >>empty strings, elements with multiple ajacent text nodes, and other non- > >>normalized result trees. > > > >Same comment applies. > > > But what if these conditions do not match between DOM and XPath and you > then > try to build XPath on top of DOM? > > >At present we have a set of rules for this in the XSLT specification, and > we > >have a documented issue that we would like to move these rules into the > Data > >Model instead. The XSLT rules go under the name of "namespace fixup", and > >are described essentially as a set of rules to be followed on element > >construction to make sure that a valid infoset results. > > > If it were to rely on a fixup, then why pass namespace nodes to the > element > constructor at all? Also, how do copy semantics work with the namespace > nodes > if there is only one per document of a particular type? > > Also, there is likely to be confusion with the DOM notion of namespace > fixup, > which is apparently not very similar in what it will fix and what it will > not. > > When reading these sections, I have a lot of questions created by the > over- > simple description, naturally because you are redefining a document object > model. I guess I just need to create a much longer issue list. > > >>I might suggest that you thoroughly study > >>the DOM specification and you will find many more border cases you have > >>missed. Construction of a hierarchy using an API is the same problem > that > >>DOM solves. > > > >I would hope that our problem is simpler, because the set of update > >operations is much smaller. But I fear you may have put your finger on a > >problem, namely that the set of operations provided by the data model > >actually permits sequences of operations that neither XSLT nor XQuery > >intends to use, and we need to either explicitly disallow such sequences > of > >operations, or define their effect precisely. Personally, I've never been > >all that happy with the construction side of the data model, because it > has > >a very procedural feel to it, which seems wrong as it is designed to > >underpin a declarative language. XPath 1.0 got round this, of course, by > not > >describing data model construction at all, describing only the valid > states > >of the model. > > > I doubt that the constructors are simpler. XPath constructors seem quite > a bit > more complex due to copy constraints. You require lots of arguments, > copying, > etc. and so XPath has lots of failures that in DOM occur later during > manipulation because it does all of its construction through arguments. > > You are recreating DOM in many ways, but incompatibly. In many cases, DOM > has > solved the issues and XPath 2.0 has not. NIH should have no place at W3C. > We > need resolution of the many issues now, as compatibly as possible with > DOM, > or they will be issues for last call and beyond. > > Ray Whitmer > rayw@netscape.com >
Received on Wednesday, 1 May 2002 04:21:50 UTC