RE: XQuery

> at VLDB in Berlin, Michael Rys convinced me to send another 
> email to the XQuery comment list. Since I don't expect my 
> email to have any influence, I don't bother writing down all 
> points of XQuery which I think should be corrected. 

Please keep sending the comments in, and bear in mind that it's much easier
for us to respond to small comments than to large ones. No-one at this stage
wants to go back to the drawing board on major aspects of the language
unless things are really badly broken (it would be better for someone to
invent a different language and compete in the marketplace - two query
languages out there would be better than none). But there are hundreds of
bugs still there to be fixed, and they won't be fixed unless you tell us
about them.

Instead I 
> concentrate on the points I think are really bad.
> 
> Here they are:
> 
> 1) Runtime Exceptions: A query language should not have 
> runtime exceptions.
>    This may not always be achievable but at least type-errors 
> should all be discovered at
>    compilation time.
>    This is not true for XQuery.

I don't think it's possible for all type errors to be discovered at
compile-time. Of course, with static typing enabled, XQuery attempts to do
this; but what actually happens is that people have to make assertions using
"treat as", which means they get a run-time failure instead. Basically you
have to do something if you encounter bad data (and bad data cannot always
be detected by schema validation).

XSLT 1.0 tried to design on the principle that there are no run-time errors.
There are cases where this is the right design principle, but there are also
cases where it is absolutely wrong. If programmers make mistakes, and they
do, then they will only be corrected if the programmer finds out about them,
and they are more likely to correct them (and to be able to correct them
more easily) if they get an error message than if they simply get a wrong
answer.

So you're making a valid point, but I think there are arguments both ways,
and although the decision that we've made won't suit everyone, I don't think
we're likely to change on this.

I also find it difficult to reconcile this comment with your comment below
that XQuery has too much implicit casting. One of the ways that XPath 1.0
avoided run-time errors was by doing lots of implicit casting.


> 2) A query language should be deterministic.
>    This is not true for XQuery.
> 
> Essentially, these are the points why I write this email.
> Here is my motivating scenario:
> 
>    In XQuery,  "p and q" and "q and p" may give a different 
> result due to runtime exceptions.
>    Why is this bad?
>    Assume the following scenario: ...
...

This is a matter that has exercised us considerably. The examples of
non-determinism that we put in the specification were expressly designed to
make sure that everyone could see quite how deep the non-determinism in the
language ran, because we are well aware of the consequences. However, the
choices are stark. Given a predicate such as 

exists(//item[@a div @b > 0])

would you really want us to require that the processor evaluates the
predicate for every item (or processes all the items in the same order) in
order to make sure that the error behavior is deterministic? I don't think
any of us would find this acceptable (by which I mean, we don't think that
our users would be prepared to pay the cost). The alternative, which is the
direction that XSLT 1.0 and XPath 1.0 took, is to try to eliminate run-time
errors (so dividing by zero gives NaN, for example). This improves the
determinism, but are "wrong answers" (failure to detect programming errors
or bad data) a price worth paying for that?

> 
> Now, what should be done to correct XQuery?
> Many things:
> 1) introduce NULL-values and three valued logic
>    (Remember that OQL had the same design flaw as 
> XQuery---although not that bad---and
>     that they introduced NULL-values in later versions after 
> having tried to correct
>     things by introducing "andthen" and "orelse" (similar to 
> "if-then-else").)

I think that null values are totally inappropriate to the XML data model.
They were introduced in SQL because SQL's tabular data model has no other
way of representing absent data. XML has plenty of good ways of representing
missing data already, it doesn't need any new ways imported from a different
and less flexible data model.

Three-valued logic would still be possible using an empty sequence as the
third value, and we did consider that seriously for a while. But it turns
out that three-valued logic has just as many surprises for the unwary
programmer, and we eventually abandoned it. I don't see how it would help
solve either of the real issues that you identify above.

> 2) Don't let empty sequences partially play the role of NULL-values.
>    (Remember: is_null(empty-sequence) is not true
>               is_empty(NULL) is not true)
>    These things are too different to be identified.

Why? In XML, an absent attribute plays exactly the same role as a null value
in SQL. When you access an absent attribute, you get an empty sequence as
the result. Would you like to change XML syntax so that an attribute can now
be present, absent, or absent-and-null? You seem to forget that XQuery is
first and foremost a query language for XML documents, and that the data
model for XML was invented long before XQuery came along. 

> 3) Do not identify single items with singleton-sequences that 
> contain that single item.
>    Even in the most flexible type systems of real and used 
> programming/query languages
>    they are distinguished.

But in XML they are not. Sequences are present in XPath and XQuery primarily
to support list-valued elements and attributes; and in XML Schema, an
integer and a list of one integer are the same thing. Again, XQuery is not
inventing a new data model, it is there to allow query of XML documents.
> 
> Other points I don't like are:
> 1) too much implicit casting

I spend most of my time responding to a community that complains bitterly
that XPath 2.0 has too little implicit casting. As long as we get equal flak
from both sides, I will be happy that we have got the balance about right.

> 2) no explicit grouping
>    (grouping has to be expressed by nested queries. these are 
> difficult to unnest.
>     unnesting is not always possible and is an error-prone 
> process due to its complexity.)
>    This is also a mistake that was made by the OQL designers.
>    (Not exactly the same, since they have an explicit grouping, but
>     a nested query had to be used to work on the "partition" 
> attribute.
>     They subsequently corrected things half-way by 
> introducing some syntactic sugar for
>     common cases. But you wouldn't call that a perfect 
> solution nor would you call it
>     elegant.)

We are aware that grouping facilities in XQuery are currently rather
limited. One of the difficulties has been in identifying the requirements
clearly. XSLT 2.0 includes a grouping capability that was well informed by a
study of use cases derived from XSLT 1.0 experience, and without this, I
think the designers would almost certainly have got it wrong. A lot of the
requests for grouping in XQuery seem to come from people with more
experience in SQL than in XML, and in my view the requirements are very
different (for example, because order is significant in XML). Please take a
look at the grouping facilities in XSLT 2.0 and tell us whether they offer
the kind of functionality you think XQuery is missing.

Michael Kay

Received on Tuesday, 14 October 2003 11:52:33 UTC