Dear All,
XQuery has serious problems/flaws in several areas.
i mention some problems here. others exist. the basis for
the following have been the documents on xquery 1.0, the semantics and
the typing paper.
1) Syntax
Out of the plenty problems I'd just like to mention
the {$i} kind of ugly thing that is used to avoid
ambiguities in the RETURN clause of a FLWR expression.
(Actually it doesn't reduce ambiguity but only reduces the probability
of occurrence. Better would be {{@@{{$i}}@@}} )
Where does this problem come from?
Well, there is a special syntax for XQuery and there is XML-syntax.
Further, XQueries can be embedded in XML and XML can be embedded (RETURN) into
XQuery. Essentially, this boils down to a change of parser modes.
Now, why not use special XML elements with an according namespace to
indicate when the parser should change to XQuery mode.
Further, the RETURN could trigger entering the XML mode.
(example:
...
for .... ...
)
/* this allows also to identify queries and give them parameters for better query reuse */
even more ugly is the syntax for attributes: . every
XML parser will be killed by it. if queries are to be embedded into documents
this is a very bad idea.
the problem can easily be solved by using explicit constructors for
elements and attributes:
...
2) Semantics
The filter operator should be a binary operator with a tree and a predicate.
All nodes fulfilling the predicate remain in the tree. others are deleted.
(other semantics problem: see 4.)
3) Typing
The type of Node is not clear. To me it reads that it is a union type of the special
node types. but then the definition of the functions is incomplete and awkward.
e.g. a function defined on NODE is NOT defined on the special node types.
Also substitutability becomes an issue. a special node then is not substitutable for
NODE.
How about using a type hierarchy at this point?
(alternative could be NODE being an interface implemented by the special node types)
I don't really see why the attributes function returns a sequence of attributes since
the order of attributes is irrelevant. (it becomes relevant with the = on sequences).
In fact, in the text (typing document) you say it returns a set.
Another problem with the type system is that it does not support references as used
in the semantics paper. Further, in most circumstandes dereferenciation is implicit.
only in the semantics paper we find explicit dereferenciation for view results.
a good type system would include a reference type.
The handling of the where e1 return e2 as an if with an else () results in severe
typing problems. (see semantics document)
4) Optimizability
XQuery is inherently unoptimizable.
Query optimization relies heavily on certain laws like
commutativity, associativity, distributivity and other reorderability laws.
Imagine a sequence s with 1002 stories. assume converting story 1002 to a string raises
an error. Then, first(string(s)) != string(first(s)) which from the
left to right would give a nice optimization.
I'm used to the following: if a = b and b = c then a = c. This kind of inference
is very useful in query optimization. Unfortunately, it does not hold in XQuery.
I'm used to simplify not(not(e)) to e and not(a != b) to a = b.
This is not possible in XQuery. This nice optimization is not possible in XQuery.
I also simplify (a + 0) to a and (a * 1) to a.
These nice optimizations are not possible in XQuery.
Usually, (true OR x) can be simplified to true and (false AND x) can be simplified to
false. This is not possible for XQuery.
All these little techniques are at the core of a query optimizer.
If the last two simplifications are not possible, essentially no optimization can be applied.
THE most important query optimization technique is reordering joins.
As noted in the semantics document, reordering joins is a priori not possible for
FLWR expressions. However, the authors claim that it becomes possible
if the FLWR expression is embedded in an unorder function call.
This is not true. It is easy to come up with a query whose result crucially depends
on the order of evaluation of the FOR clauses of the FLWR expression.
unordering the result sequence does not help. (imagine a where clause with position calls inside)
In fact, deciding whether an unorder call around a FLWR expression allows to
reorder the FOR clauses is undecidable.
Hence, it is undecidable whether a query can be optimized or not.
This essentially means that the query language is unoptimizable.
Further, it is a good design principle to make queries that are cheaper to evaluate
also cheaper to write (fewer characters). Hence, instead of having an unorder function
that has to called explicitly, the default should be unordered and only some
preserve order indicator should make the query processor preserving the order.
For that to work for the different for entries of a FLWR expression,
it is necessary to bundle them so that their reorderability can be indicated.
here, it helps if we skip the for and let keywords and replace them by a single one.
(which is easily possible due to the IN and ::= notation)
let's call the resulting clause that contains all former FOR and LET expressions FROM.
then we could write
FROM a in expr
b in expr
c := expr
...
And no order would be preserved.
or we could write
ORDERED FROM a in expr
b in expr
c := expr
...
this query wouldn't be optimizable but it is easy to say in the manual that
using ORDERED results in unoptimizability and hence expensive execution.
a counter argument might be that in documents order is crucial.
but: a query language's primary focus is not in restructuring documents to get other
documents (use XSL-T instead) but to extract certain parts and possibly compute new information
about the extracted parts.
The grouping/aggregate computation in SQL is much better for optimization than the
one in OQL where subqueries/expressions must be used in the aggregate functions.
(reason: efficient evaluation by a GAgg operator (see Dayal) and accompanying
optimization techniques can be applied to the SQL kind of grouping/aggregation.
It can not directly applied to OQL/XQuery since queries must be unnested/rewritten in
order to apply these techniques. In general this is not possible.)
As a consequence, XQuery's FLWR expressions should become FLWGHR expressions
with a grouping and having clause.
Grouping of elements in the RETURN clause is done by nested queries.
This approach is very bad since nested queries tend to be expensive to
evaluate. There certainly exist unnesting techniques, but not all
nested queries can be unnested. Hence, a query language should not rely
on unnesting techniques only. A much better approch here would be
to do explicit grouping of the variables bound in FL.
This could easily be achieved by introducing a ...
That way, nested queries for grouping are not necessary anymore.
Other problem areas are:
1) raising exceptions/returning errors:
The golden rules of query language design here is:
A query language should not raise runtime errors.
All possible errors must be detectable at compile time.
With some good will one can accept a few exception to this golden rule
but only if there are no problems with the four areas mentioned above.
This is not the case with errors/exceptions in XQuery.
2) impliciteness:
The golden rule here is:
Everything in the query language should be explicit.
XQuery violates this rule heavily.
There is implicit flattening of sequences, singleton sequences
may be treated the same way as the element they are containing
(and vice versa (awkward since sequences are not allowed to contain
other sequences (which i think is a good idea))), a lot of casting is implicit ...
The latter point is as bad that even the xquery designers
could not come up with a collation hierarchy.
3) XQuery is not web-aware.
What do I mean by this.
Remember URL's. They are embedded in documents. They are rarely
typed in by the user but much more frequently reused by clicking on
some text in some document shown to the user by the browser.
Remember SQL. Ad-hoc queries are rare. Most queries are
embedded SQL queries within applications (comparable to documents)
and hence are reused plenty of time.
So I strongly believe that XQueries will be embedded into documents
more often XQuery is used for ad-hoc queries.
These embedded queries should be reusable as any URL embedded in a document
is reusable. In order to do so, queries must be identifiable and
referenciable (and this web-wide).
Things become even better if we give parameters to queries. Even better: optional
parameters and use a 3-valued logic for evaluating query references with partial
parameter binding.
Only this way it becomes possible to build webs of queries and
to come closer to a semantic web.
Neither of the above is the case for XQuery queries.
Summarizing, I've the impression that the designers of XQuery tried to come up
``in one shot'' with a
very powerful query language that does almost everything.
The appropriate approach would have been to start with a lean and clean kernel language
and then extend it carefully.
best regards
guido moerkotte