- From: Kay, Michael <Michael.Kay@softwareag.com>
- Date: Tue, 14 Oct 2003 17:52:03 +0200
- To: Guido Moerkotte <moerkotte@informatik.uni-mannheim.de>, public-qt-comments@w3.org
- Cc: mrys@microsoft.com, moer@pi3.informatik.uni-mannheim.de
- Message-ID: <DFF2AC9E3583D511A21F0008C7E62106073DD1C3@daemsg02.software-ag.de>
> at VLDB in Berlin, Michael Rys convinced me to send another > email to the XQuery comment list. Since I don't expect my > email to have any influence, I don't bother writing down all > points of XQuery which I think should be corrected. Please keep sending the comments in, and bear in mind that it's much easier for us to respond to small comments than to large ones. No-one at this stage wants to go back to the drawing board on major aspects of the language unless things are really badly broken (it would be better for someone to invent a different language and compete in the marketplace - two query languages out there would be better than none). But there are hundreds of bugs still there to be fixed, and they won't be fixed unless you tell us about them. Instead I > concentrate on the points I think are really bad. > > Here they are: > > 1) Runtime Exceptions: A query language should not have > runtime exceptions. > This may not always be achievable but at least type-errors > should all be discovered at > compilation time. > This is not true for XQuery. I don't think it's possible for all type errors to be discovered at compile-time. Of course, with static typing enabled, XQuery attempts to do this; but what actually happens is that people have to make assertions using "treat as", which means they get a run-time failure instead. Basically you have to do something if you encounter bad data (and bad data cannot always be detected by schema validation). XSLT 1.0 tried to design on the principle that there are no run-time errors. There are cases where this is the right design principle, but there are also cases where it is absolutely wrong. If programmers make mistakes, and they do, then they will only be corrected if the programmer finds out about them, and they are more likely to correct them (and to be able to correct them more easily) if they get an error message than if they simply get a wrong answer. So you're making a valid point, but I think there are arguments both ways, and although the decision that we've made won't suit everyone, I don't think we're likely to change on this. I also find it difficult to reconcile this comment with your comment below that XQuery has too much implicit casting. One of the ways that XPath 1.0 avoided run-time errors was by doing lots of implicit casting. > 2) A query language should be deterministic. > This is not true for XQuery. > > Essentially, these are the points why I write this email. > Here is my motivating scenario: > > In XQuery, "p and q" and "q and p" may give a different > result due to runtime exceptions. > Why is this bad? > Assume the following scenario: ... ... This is a matter that has exercised us considerably. The examples of non-determinism that we put in the specification were expressly designed to make sure that everyone could see quite how deep the non-determinism in the language ran, because we are well aware of the consequences. However, the choices are stark. Given a predicate such as exists(//item[@a div @b > 0]) would you really want us to require that the processor evaluates the predicate for every item (or processes all the items in the same order) in order to make sure that the error behavior is deterministic? I don't think any of us would find this acceptable (by which I mean, we don't think that our users would be prepared to pay the cost). The alternative, which is the direction that XSLT 1.0 and XPath 1.0 took, is to try to eliminate run-time errors (so dividing by zero gives NaN, for example). This improves the determinism, but are "wrong answers" (failure to detect programming errors or bad data) a price worth paying for that? > > Now, what should be done to correct XQuery? > Many things: > 1) introduce NULL-values and three valued logic > (Remember that OQL had the same design flaw as > XQuery---although not that bad---and > that they introduced NULL-values in later versions after > having tried to correct > things by introducing "andthen" and "orelse" (similar to > "if-then-else").) I think that null values are totally inappropriate to the XML data model. They were introduced in SQL because SQL's tabular data model has no other way of representing absent data. XML has plenty of good ways of representing missing data already, it doesn't need any new ways imported from a different and less flexible data model. Three-valued logic would still be possible using an empty sequence as the third value, and we did consider that seriously for a while. But it turns out that three-valued logic has just as many surprises for the unwary programmer, and we eventually abandoned it. I don't see how it would help solve either of the real issues that you identify above. > 2) Don't let empty sequences partially play the role of NULL-values. > (Remember: is_null(empty-sequence) is not true > is_empty(NULL) is not true) > These things are too different to be identified. Why? In XML, an absent attribute plays exactly the same role as a null value in SQL. When you access an absent attribute, you get an empty sequence as the result. Would you like to change XML syntax so that an attribute can now be present, absent, or absent-and-null? You seem to forget that XQuery is first and foremost a query language for XML documents, and that the data model for XML was invented long before XQuery came along. > 3) Do not identify single items with singleton-sequences that > contain that single item. > Even in the most flexible type systems of real and used > programming/query languages > they are distinguished. But in XML they are not. Sequences are present in XPath and XQuery primarily to support list-valued elements and attributes; and in XML Schema, an integer and a list of one integer are the same thing. Again, XQuery is not inventing a new data model, it is there to allow query of XML documents. > > Other points I don't like are: > 1) too much implicit casting I spend most of my time responding to a community that complains bitterly that XPath 2.0 has too little implicit casting. As long as we get equal flak from both sides, I will be happy that we have got the balance about right. > 2) no explicit grouping > (grouping has to be expressed by nested queries. these are > difficult to unnest. > unnesting is not always possible and is an error-prone > process due to its complexity.) > This is also a mistake that was made by the OQL designers. > (Not exactly the same, since they have an explicit grouping, but > a nested query had to be used to work on the "partition" > attribute. > They subsequently corrected things half-way by > introducing some syntactic sugar for > common cases. But you wouldn't call that a perfect > solution nor would you call it > elegant.) We are aware that grouping facilities in XQuery are currently rather limited. One of the difficulties has been in identifying the requirements clearly. XSLT 2.0 includes a grouping capability that was well informed by a study of use cases derived from XSLT 1.0 experience, and without this, I think the designers would almost certainly have got it wrong. A lot of the requests for grouping in XQuery seem to come from people with more experience in SQL than in XML, and in my view the requirements are very different (for example, because order is significant in XML). Please take a look at the grouping facilities in XSLT 2.0 and tell us whether they offer the kind of functionality you think XQuery is missing. Michael Kay
Received on Tuesday, 14 October 2003 11:52:33 UTC