Re: [www-ql] <none> from Michael Kifer on 2001-03-01 (www-ql@w3.org from January to March 2001)

From: Michael Kifer <kifer@cs.sunysb.edu>
Date: Wed, 28 Feb 2001 19:41:31 -0500
To: www-ql@w3.org
Message-Id: <200103010041.TAA24971@sbcs.cs.sunysb.edu>
I think there is a serious misunderstanding here, as several msgs from
different people indicate. Since it is getting hard to make proper
attributions, I'll use yet another message format ;)

JR: Jonathan Robie
JC: Jeff Chapman
IM: Ingo Macherius
MK: me (past statements)

JR>I'm not sure what you mean by "SQL-like functionality". Are 
JR>you concerned about whether we can do queries on structured documents?

JC> I can see the value in a SQL-like syntax since there are plenty of
JC> SQL-literate folks in the audience and because this syntax could be easily
JC> digested by most XML-literate readers.  I was mainly concerned that this
JC> syntax was implemented at the expense of the usefulness that a valid XML
JC> syntax brings to the table.

Although I am not involved in the design of XQuery, I think I understand
the goals. The idea is to come up with a declarative query language for
querying documents. You can call it SQL-style, OQL-style, or whatever, but
this term is really misleading.

The last 30 years have demonstrated the utility and power of a certain
subset of predicate calculus as a data query language. SQL was one of the
first syntactic sugars (augmented with additional pragmatic operators) on
top of that language. OQL, Loral, XML-QL, Quilt, and now XQuery use the
same style.  Although the logic behind them often isn't stated
explicitly, their semantics is always defined the same way (take all the
instantiations of vars that satisfy the conditions and construct the
result). It is this semantics that makes them logic-based and related to
each other.

Anyway, this is all not that important. The important thing is that some
magic makes these languages well-suited for writing complex queries in a
few lines of code. Somehow other languages that aren't based on this
paradigm don't quite cut it as query languages: it's a curse :-)

Let's take XSLT as an example. A powerful language by all accounts (even if
its exact semantics can cause one's brains to overheat). But, it fails
miserably when it comes to simple things like joins. Not that you can't
express joins in it --- you wouldn't want to. XSLT can do many
transformations easily, but not the kind that arises in "data intensive
applications" (e.g., find all employees that earn 5% more than their
managers ;-)

Putting together an "SQL-style" language is not a simple matter, although
the final result might seem simple (so some might ask, "what took you so
long?!"). A number of semantic and syntactic features have to come together
and the language must provide for natural representation of common queries
and have the desired expressive power.

The latter is obvious to database people, but many times I find that this
is lost to people from other areas: database languages are designed so that
certain queries would be expressible *and most queries would not be*.
The latter is important so that query processors would be able to perform
non-trivial optimizations. (Let's not discuss the expressive power of
XQuery, because it is beside the point.)

When all the components come together, you get *a* concrete syntax (note the
"a"). Converting it into a different style syntax (XML in this case) is
trivial. However, doing it too soon is a bad idea, because most people
don't have XML parsing chips embedded in their heads.

Let me now address couple of points raised by Ingo, because they are related:

IM> In fact FLWR-XQuery has to be mapped to Algebra at some point anyway

MK> The semantics of XQuery is transparent enough to be able to do this.

IM> Hm, isn't the tail waiving with the dog here ? XQuery has to prove
IM> compliance with Algebra by giving a mapping, not the other way round ...

Here I find the same type of confusion. A declarative language doesn't need
to prove compliance with any algebra. You design such a language in order
to have a means of expressing complex queries easily (with an eye
on being able to "optimize" them). Such a language comes with a precise
semantics. 

An algebra is a procedural language that exists in order to provide an
intermediate, relatively high language to which you translate a declarative
language. It is the semantics of the declarative language that determines
how it is translated into algebra --- not the other way around.

Once XQuery and the algebra are finalized, a translation can be specified.
It is not hard. Typically, a "dumb" translation is provided, which is
guaranteed to be semantically correct and is easy to understand, but which
doesn't yield efficient execution.  Then query optimizers take over and do
the rest (a very hard job, indeed).

So, Ingo, I believe, that it is the above statement of yours that "wags the
dog" and not mine ;-)

MK> We all seem to agree that for humans FLWR is easier to
MK> understand than XML. Since at this stage people need to understand the
MK> semantics and the expressive power of the proposed language, FLWR seems to
MK> be a message format that is superior to XML :-)

IM> Tail waving with dog again. A concrete
IM> syntax such as FLWR obviously was needed to finally start a discussion.
IM> However, semantic changes can only be in the Algebra, and syntax only comes
IM> second.

As I explained above, this statement flies in the face of the last 30 years
of developments in databases. Semantics is not in the algebra, but in the
query language. A query language is mapped into the algebra in such a way
that its (the language's) semantics is faithfully implemented by the algebra.

Of course, algebra also has semantics, but it is procedural semantics and I
won't go into it. The important point is that the declarative semantics of
the query language is ***independent*** of the procedural semantics of the
algebra (couldn't emphasize it more).



	--michael
Received on Wednesday, 28 February 2001 19:42:04 UTC