XQuery

Hello,

at VLDB in Berlin, Michael Rys convinced me to send another email to the XQuery
comment list.
Since I don't expect my email to have any influence, I don't bother writing down all
points of XQuery which I think should be corrected. Instead I concentrate on the
points I think are really bad.

Here they are:

1) Runtime Exceptions: A query language should not have runtime exceptions.
   This may not always be achievable but at least type-errors should all be discovered at
   compilation time.
   This is not true for XQuery.
2) A query language should be deterministic.
   This is not true for XQuery.

Essentially, these are the points why I write this email.
Here is my motivating scenario:

   In XQuery,  "p and q" and "q and p" may give a different result due to runtime exceptions.
   Why is this bad?
   Assume the following scenario:
   A big company developes a web-site based on millions of XML documents.
   In all these documents XQueries are embedded to give dynamic up-to-date information.
   Millions of the queries embedded in the documents use conjunctions and after
   thorough testing all queries work and the web site is put into operation.
   After several years, the XML document base grows and queries slow down.
   The sysadmin decides to gather new statistics and let the queries be reoptimized.
   The query optimizer decides that changing the order of conjunctions will result in
   better plans for about half a million queries. Unfortunately, due to the indeterminism,
   all queries crash. The web site is down for about three month, costing the company
   millions of dollars(!).

What could be your answer?
  -- The programmers should have used cascading "if-then-else" expressions.
     NO! for two reasons:
     1) It could well be that they are not aware of the complications involved with using
        "and" and "or" and runtime exceptions.
        (Remember that testing went all o.k.)
     2) Using cascading "if-then-else" averts any query optimization.
        The query is not declarative any more.

Now, what should be done to correct XQuery?
Many things:
1) introduce NULL-values and three valued logic
   (Remember that OQL had the same design flaw as XQuery---although not that bad---and
    that they introduced NULL-values in later versions after having tried to correct
    things by introducing "andthen" and "orelse" (similar to "if-then-else").)
2) Don't let empty sequences partially play the role of NULL-values.
   (Remember: is_null(empty-sequence) is not true
              is_empty(NULL) is not true)
   These things are too different to be identified.
3) Do not identify single items with singleton-sequences that contain that single item.
   Even in the most flexible type systems of real and used programming/query languages
   they are distinguished.

Other points I don't like are:
1) too much implicit casting
2) no explicit grouping
   (grouping has to be expressed by nested queries. these are difficult to unnest.
    unnesting is not always possible and is an error-prone process due to its complexity.)
   This is also a mistake that was made by the OQL designers.
   (Not exactly the same, since they have an explicit grouping, but
    a nested query had to be used to work on the "partition" attribute.
    They subsequently corrected things half-way by introducing some syntactic sugar for
    common cases. But you wouldn't call that a perfect solution nor would you call it
    elegant.)

best
 guido

ps: although XQuery improved, some of the points of my first email are still valid.

Received on Tuesday, 14 October 2003 06:53:51 UTC