Dear All,

XQuery has serious problems/flaws in several areas.
i mention some problems here. others exist. the basis for
the following have been the documents on xquery 1.0, the semantics and
the typing paper.

1) Syntax
   Out of the plenty problems I'd just like to mention
   the {$i} kind of ugly thing that is used to avoid
   ambiguities in the RETURN clause of a FLWR expression.
   (Actually it doesn't reduce ambiguity but only reduces the probability
    of occurrence. Better would be {{@@{{$i}}@@}} )
   Where does this problem come from?
   Well, there is a special syntax for XQuery and there is XML-syntax.
   Further, XQueries can be embedded in XML and XML can be embedded (RETURN) into
   XQuery. Essentially, this boils down to a change of parser modes.
   Now, why not use special XML elements with an according namespace to
   indicate when the parser should change to XQuery mode.
   Further, the RETURN could trigger entering the XML mode.
   (example:
     <doc> ...
       <xql:expr> for .... </xlq:expr> ...
     </doc>
   )
   /* this allows also to identify queries and give them parameters for better query reuse */
   even more ugly is the syntax for attributes:  <e a = {$1} />.  every
   XML parser will be killed by it. if queries are to be embedded into documents
   this is a very bad idea.
   the problem can easily be solved by using explicit constructors for
   elements and attributes:
    <xql:element name="..."><xql:attribute name="...">...

2) Semantics
   The filter operator should be a binary operator with a tree and a predicate.
   All nodes fulfilling the predicate remain in the tree. others are deleted.

   (other semantics problem: see 4.)

3) Typing
   The type of Node is not clear. To me it reads that it is a union type of the special
   node types. but then the definition of the functions is incomplete and awkward.
   e.g. a function defined on NODE is NOT defined on the special node types.
   Also substitutability becomes an issue. a special node then is not substitutable for
   NODE.
   How about using a type hierarchy at this point?
   (alternative could be NODE being an interface implemented by the special node types)

   I don't really see why the attributes function returns a sequence of attributes since
   the order of attributes is irrelevant. (it becomes relevant with the = on sequences).
   In fact, in the text (typing document) you say it returns a set.

   Another problem with the type system is that it does not support references as used
   in the semantics paper. Further, in most circumstandes dereferenciation is implicit.
   only in the semantics paper we find explicit dereferenciation for view results.
   a good type system would include a reference type.

   The handling of the where e1 return e2 as an if with an else () results in severe
   typing problems. (see semantics document)

4) Optimizability
   XQuery is inherently unoptimizable.
   Query optimization relies heavily on certain laws like
   commutativity, associativity, distributivity and other reorderability laws.
   Imagine a sequence s with 1002 stories. assume converting story 1002 to a string raises
   an error. Then, first(string(s)) != string(first(s)) which from the
   left to right would give a nice optimization.

   I'm used to the following: if a = b and b = c then a = c. This kind of inference
   is very useful in query optimization. Unfortunately, it does not hold in XQuery.

   I'm used to simplify not(not(e)) to e and not(a != b) to a = b.
   This is not possible in XQuery. This nice optimization is not possible in XQuery.

   I also simplify (a + 0) to a and (a * 1) to a.
   These nice optimizations are not possible in XQuery.

   Usually, (true OR x) can be simplified to true and (false AND x) can be simplified to
   false. This is not possible for XQuery.
   
   All these little techniques are at the core  of a query optimizer.
   If the last two simplifications are not possible, essentially no optimization can be applied.

   THE most important query optimization technique is reordering joins.
   As noted in the semantics document, reordering joins is a priori not possible for
   FLWR expressions. However, the authors claim that it becomes possible
   if the FLWR expression is embedded in an unorder function call.
   This is not true. It is easy to come up with a query whose result crucially depends
   on the order of evaluation of the FOR clauses of the FLWR expression.
   unordering the result sequence does not help. (imagine a where clause with position calls inside)
   In fact, deciding whether an unorder call around a FLWR expression allows to
   reorder the FOR clauses is undecidable.
   Hence, it is undecidable whether a query can be optimized or not.
   This essentially means that the query language is unoptimizable.

   Further, it is a good design principle to make queries that are cheaper to evaluate
   also cheaper to write (fewer characters). Hence, instead of having an unorder function
   that has to called explicitly, the default should be unordered and only some
   preserve order indicator should make the query processor preserving the order.
   For that to work for the different for entries of a FLWR expression,
   it is necessary to bundle them so that their reorderability can be indicated.
   here, it helps if we skip the for and let keywords and replace them by a single one.
   (which is easily possible due to the IN and ::= notation)
   let's call the resulting clause that contains all former FOR and LET expressions FROM.
   then we could write
    FROM   a in expr
           b in expr
           c := expr
    ...
    And no order would be preserved.
    or we could write
    ORDERED FROM a in expr
                 b in expr
                 c := expr
    ...
    this query wouldn't be optimizable but it is easy to say in the manual that
    using ORDERED results in unoptimizability and hence expensive execution.
  
    a counter argument might be that in documents order is crucial. 
    but: a query language's primary focus is not in restructuring documents to get other
    documents (use XSL-T instead) but to extract certain parts and possibly compute new information
    about the extracted parts.
    
    
    The grouping/aggregate computation in SQL is much better for optimization than the
    one in OQL where subqueries/expressions must be used in the aggregate functions.
    (reason: efficient evaluation by a GAgg operator (see Dayal) and accompanying
     optimization techniques can be applied to the SQL kind of grouping/aggregation.
     It can not directly applied to OQL/XQuery since queries must be unnested/rewritten in
     order to apply these techniques. In general this is not possible.)
    As a consequence, XQuery's FLWR expressions should become FLWGHR expressions
    with a grouping and having clause.


    Grouping of elements in the RETURN clause is done by nested queries.
    This approach is very bad since nested queries tend to be expensive to
    evaluate. There certainly exist unnesting techniques, but not all
    nested queries can be unnested. Hence, a query language should not rely
    on unnesting techniques only. A much better approch here would be
    to do explicit grouping of the variables bound in FL.
    This could easily be achieved by introducing a <xql:foreach vars="...">...
    That way, nested queries for grouping are not necessary anymore.
    
Other problem areas are:

1) raising exceptions/returning errors:
 The golden rules of query language design here is:
  A query language should not raise runtime errors.
  All possible errors must be detectable at compile time.

With some good will one can accept a few exception to this golden rule
but only if there are no problems with the four areas mentioned above.
This is not the case with errors/exceptions in XQuery. 

2) impliciteness:
 The golden rule here is:
  Everything in the query language should be explicit.
  XQuery violates this rule heavily.
  There is implicit flattening of sequences, singleton sequences
  may be treated the same way as the element they are containing
  (and vice versa (awkward since sequences are not allowed to contain
   other sequences (which i think is a good idea))), a lot of casting is implicit ...
  The latter point is as bad that even the xquery designers
  could not come up with a collation hierarchy.

3) XQuery is not web-aware.
   What do I mean by this.
   Remember URL's. They are embedded in documents. They are rarely
   typed in by the user but much more frequently reused by clicking on
   some text in some document shown to the user by the browser.
   Remember SQL. Ad-hoc queries are rare. Most queries are
   embedded SQL queries within applications (comparable to documents)
   and hence are reused plenty of time.
   So I strongly believe that XQueries will be embedded into documents
   more often XQuery is used for ad-hoc queries.
   These embedded queries should be reusable as any URL embedded in a document
   is reusable. In order to do so, queries must be identifiable and
   referenciable (and this web-wide). 
   Things become even better if we give parameters to queries. Even better: optional
   parameters and use a 3-valued logic for evaluating query references with partial
   parameter binding.
   Only this way it becomes possible to build webs of queries and
   to come closer to a semantic web.
   Neither of the above is the case for XQuery queries.
   

Summarizing, I've the impression that the designers of XQuery tried to come up 
``in one shot'' with a 
very powerful query language that does almost everything.
The appropriate approach would have been to start with a lean and clean kernel language
and then extend it carefully.

best regards
 guido moerkotte