- From: Jonathan Robie <jonathan.robie@softwareag.com>
- Date: Tue, 22 Jan 2002 07:48:44 -0500
- To: Mike Schilling <mschilling@edgility.com>, www-xpath-comments@w3.org, www-xquery-comments@w3.org
At 05:10 PM 1/21/2002 -0500, Mike Schilling wrote: >The following message is a courtesy copy of an article >that has been posted to edgility.eng.xml as well. > >I am writing as someone whose software uses XPath 1.0 as a syntax for >recognizing and extracting information in XML documents. > >I. Incompatibilities: >Incompatibilities introduced in XPath 2.0 will cause difficulties for us, >in all of the obvious areas: > > Existing XPath strings no longer working correctly. > > Need to retrain customers in XPath 2.0 concepts where they differ from > XPath 1.0 concepts. > > Fragility of XPath implementations where they attempt to retain 1.0 > compatibility in problematical ways. Obviously, compatibility matters. Unfortunately, XPath 1.0 was never designed to be a subset of a type-safe query language, and we felt it was very important that XPath 2.0 and XQuery be deeply integrated, and that XQuery be integrated with the type system of XML Schema. In general, I think we have managed to keep the number of incompatibilities small despite the rather large change in functionality and type safety. Also note that XPath 2.0 provides fallback conversions for the very purpose of maintaining compatibility where possible. >While appendix D listing incompatibilities is extensive, it leaves out two >of the most important: > >1. Requiring path elements which match keywords to be escaped. >This is unacceptable, since it makes an unbounded set of existing XPath >expression invalid. It is also unacceptable since it leaves open the >possibility that, as future XPath versions will introduce new keywords, >yet more XPath expressions become invalid. In fact, it amounts to a >requirement that, for safety, every name in every path expression be >escaped. This is a fundamental change to XPath syntax. In the current Working Drafts, XPath 2.0 does not reserve its keywords, but XQuery does. So the syntax of XPath actually has not been changed in this regard. Moreover, since XPath must be unambiguous without reserved keywords, and both XPath and XQuery are generated from the same grammar, the productions they have in common do not require reserved keywords to be parsed unambiguously. That said, I am quite certain we are not done debating reserved keywords. There are strong feelings on both sides of the issue. >If keywords are required, a much better choice is to make them >distinguishable from path elements, i.e. forbid them from matching the XML >Name production. Requiring them to begin with a colon (like the proposed, >unacceptable syntax for path element escaping) is one possibility. This is one of many designs that have been discussed. There are at least some people who feel this would make XQuery more awkward to use. Also note that this would introduce an incompatibility with XPath 1.0, which has no such requirement. >2. The introduction of the for statement >This changes XPath from an expression-matching language to a >pseudo-procedural one. It's quite unclear why "for" and "return" are >included, but not "if" and "while". Actually, XPath does include if: <snip href="http://www.w3.org/TR/xpath20/#id-conditionals"> 2.9 Conditional Expressions XPath supports a conditional expression based on the keywords if, then, and else. [7] IfExpr ::= "if" "(" Expr ")" "then" Expr "else" Expr </snip> Neither XPath nor XQuery have a while construct. > It's also unclear how XPath beneifts from implementing half of the > XQuery FLWR statement. Personally, I agree with you. I don't think we're done discussing this particular question, but there was not consensus to put the whole FLWR expression in the current Working Drafts, and there was a strong feeling from the XSL Working Group that at least "for" and "return" were needed. >The examples given for "for" in the spec are quite unconvincing, since >they describe the sort of transformation which is the province of XSLT and >XQuery, not XPath. > >This will make user training in XPath far more difficult, since it breaks >the existing user view of XPath as a pure pattern-matching language. We have spent a lot of time agonizing over the proper dividing line between XPath and XQuery. There were a number of people who would have preferred that XPath 2.0 retain roughly the same functionality as XPath 1.0. However, we have seen good examples of stylesheets that could be written in a simpler and more straightforward manner by moving more functionality into XPath. >Note that every incompatibility introduces increases the likelihood either >that XPath will split into dialects or that XPath 2.0 will simply be >rejected. The history of SQL 3.0 standard is a lesson about the limited >ability of a standards effort to make fundamental changes to an existing >language. I see a number of reasonable positions. At one end of the continuum, we could have said that XPath 2.0 would simply be XPath 1.0, with no changes whatsoever. That would mean no support for XML Schema types, no additional functionality, and incompatibility with XQuery for expressions that look the same in both languages. XQuery would then support schema types, provide full query functionality and type safety, etc. Advantages: XPath remains quite simple and completely compatible with older versions, XQuery can provide full functionality. Disadvantages: incompatibilities between XQuery and XPath would be hard for users to master. However, there were requirements for XPath to add significant functionality in XPath 2.0 (see http://www.w3.org/TR/xpath20req), and once this functionality was added, the differences between XPath and XQuery started to fade significantly. It did not seem to make sense to release two new languages this similar with very minor differences that lead to incompatibilities. >II. Missing functionality. > >Member-wise operations on sequences are both natural and extermely >useful. Take the requirement > > 1. Given an XML document containing a purchase order and its > line <item> elements, calculate the total amount of the > purchase order by summing the price times the quantity of each > item. The nodeset is identified by item, and the expression to > sum would be price * quantity. This can be done as follows: sum( for $i in items return $i/quantity * $i/USPrice ) So I do not believe this functionality is actually missing. Incidentally, I find the plural "items" confusing here, I assume that "items" would contain a sequence of "item" elements, in which case the query would be: sum( for $i in items/item return $i/quantity * $i/USPrice ) >A natural way to express this, which does not require the for statment, is > > sum(items/quantity * items/USPrice) > >In fact, when my co-workers and I were first learning XPath, we had to >read the spec carefully to convince ourselves that this wasn't correct. Consider the following: sum(items/item/quantity * items/item/USPrice) If there can be many "item" elements, we have two sequences of elements with a multiply operator between them. How should this be defined? At one point, we considered multiplying each item on the left of the operator by each item on the right, but we finally decided it should simply be an error. You suggest the following semantics: >The semantics are simple: an n-ary operation on n sequences is allowed if >all sequences have the same length m, and is interpreted as being done >memberwise and resulting in a sequence of m members. Type exceptions are >generated as if the operations are done from first member to m'th member. The for expression seems to be a more general solution to this problem. There are several reasonable ways that multiplication of two sequences might be defined, and it is not obvious to me that any one of them is "the" right definition. In general, this is a pretty good indication that we should let the user define what is actually intended. And that's what "for" and "return" do very well. Personally, I would like to be able to use "let" and "where" in XPath 2.0 as well. > (The restriction is understandable in XPath 1.0, where nodesets cannot > be freely constructed. It is not understandable in 2,0, where sequences > construction is explicitly supported.) It's clear that the current > XPath definition of > > items/quantity * items/USPrice > >the product of the first node of each nodeset, is useless. A good XPath >expression construction tool will warn the user that it almost certainly >is not what is desired. But why redefine it as an error (as XPath 2.0 >does) when it has an obvious and useful meaning? The XPath 1.0 definition was not considered acceptable for a strongly typed language like XQuery, and we wanted XQuery and XPath to handle this in the same way. Any redefinition results in an incompatibility with XPath 1.0. Silently changing the interpretation and reporting no error is more dangerous than reporting an error. Jonathan These are my opinions right now. They may be quite different from the opinions of Software AG, the W3C XML Query Working Group, or the opinions that I will have after reading and considering your response.
Received on Tuesday, 22 January 2002 08:41:26 UTC