RE: XQuery feedback from Jonathan.Robie@SoftwareAG-USA.com on 2001-02-22 (www-ql@w3.org from January to March 2001)

From: <Jonathan.Robie@SoftwareAG-USA.com>
Date: Thu, 22 Feb 2001 11:20:51 -0500
To: elenz@xyzfind.com, www-ql@w3.org
Message-ID: <80B2BC83D9C0D411AE7D0050BAB106DD18CCB1@sunshine.softwareag-usa.com>
Evan Lenz wrote:

> Reinventing the Wheel

I think I should start by responding to the title of this message. The
phrase "reinventing the wheel" usually refers to reinventing something that
already exists because you don't know about it. The editors of XQuery
include a former member of the XSL Working Group who has written a fair
number of stylesheets. They also include one of the inventors of SQL, one of
the inventors of XML-QL, and one of the inventors of XQL, a precursor of
XPath. We considered quite a few syntax approaches, including building on
XSLT, before arriving at the approach we used.

Also, you imply that we are off on a completely different track than XSLT.
In fact, we are working closely together with the XSL Working Group to
define XPath 2.0. This includes not only adding features, but deriving a new
model for XPath that is able to account for XML Schema types.

> After reviewing the XQuery spec, I'm concluding that the 
> overlap between XQuery and XSLT is far too great for the 
> W3C to reasonably recommend them both as separate languages. 

XQuery and XSLT will share a common expression language, including path
expressions. XSLT is really two languages, an XML-based language used to
write the templates, and XPath, an expression language used for patterns.
Both XQuery and XSLT will use XPath 2.0, and the two Working Groups are
working closely together on this. So the two languages will share a great
deal.

Why have a new language? Three reasons: (1) ease of use for our use cases,
(2) optimizability, (3) strong data typing.

1. Ease of use

XQuery is significantly more straightforward for a lot of common database
queries. To some extent, what is straightforward is a matter of taste, a
realm where logic does not reach, but I think that some of the reasons are
worth stating.

First, simple queries are simpler in XQuery. For instance, an XPath 2.0
expression that uses the abbreviated syntax is also a valid query by itself.
This is not true of XPath. Your document
http://www.xmlportfolio.com/xquery.html incorrectly labels XPath expressions
as XSLT, but an XSLT processor will not process your examples unless you
place them in a template. Consider a simple query that looks for all
employees in a set of documents:

   //emp

This is much easier to read and write than the equivalent XSLT stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
	<xsl:output method="xml" version="1.0" encoding="UTF-8"
indent="yes"/>
	
	<xsl:template match="/emp">
          <xsl:copy-of select="."/>
	</xsl:template>

</xsl:stylesheet>

This difference is also present for some moderately complex queries. When
you consider the following XQuery expression:

   /emp[rating = "Poor"]/@mgr->emp/@mgr->emp/name

you compare it to the following XSLT fragment:

   <xsl:variable name="poorEmpManagers" select="id(/emp[rating =
'Poor']/@mgr)[self::emp]"/>

I think it would be a fairer comparison if you typed in the entire
stylesheet that you would have to write in XSLT. I didn't test this, but I
think the following is approximately what you would have to write:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
	<xsl:output method="xml" version="1.0" encoding="UTF-8"
indent="yes"/>
	
	<xsl:template match="/">
	     <xsl:variable name="poorEmpManagers" select="id(/emp[rating =
'Poor']/@mgr)[self::emp]"/>
          <xsl:copy-of select="id($poorEmpManagers/@mgr)[self::emp]/name"/>
	</xsl:template>

</xsl:stylesheet>

The fact that *any* expression in XQuery is a valid query makes it easier to
write simple queries, without the overhead associated with a stylesheet. For
what it's worth, here's the shortest XQuery expression that can be executed
as a stand-alone query:

       1

Also, the keyword-oriented approach of XQuery is more familiar and
comfortable to many programmers. I would rather write:

FOR $b IN document("bib.xml")//book
WHERE $b/publisher = "Morgan Kaufmann"
AND $b/year = "1998"
RETURN $b/title

than

<xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <xsl:for-each select="document('bib.xml')//book">
      <xsl:if test="publisher='Morgan Kaufmann' and year='1998'">
        <xsl:copy-of select="title"/>
      </xsl:if>
    </xsl:for-each>
  </xsl:template>
</xsl:transform>

Note that there's been no great rush to create an XML syntax for Java,
JavaScript, Visual Basic, or other high level programming languages. Several
people have attempted to make XML syntaxes for SQL, but I have not been
impressed by the results.

2. Conventional Database Functionality

XQuery is more suitable to many of the kinds of queries that SQL programmers
are used to. Joins and the distinct() function account for a lot of this -
no surprise, since XQuery's FLWR expressions are quite similar to SQL's
SELECT/FROM/WHERE. It may make sense, incidentally, to add these to XSLT as
well. Another reason for XQuery and XSLT to continue to work together on
XPath 2.0.


To a database person, it is somewhat surprising that your paper does not
explicitly mention joins, which are one of the biggest reasons for FLWR
expressions in XQuery. Joins are central to database functionality, and it
is important to express them in a way that allows optimization based on
patterns detected in the expressions. I also notice that the examples in
your paper do not include any examples from Section 3 of the XQuery paper,
which shows how conventional SQL-like queries are done. 

In your paper, you point out that FLWR expressions do have some syntactic
similarity to XSLT's <xsl:foreach />. This is true, but it misses the
purpose of FLWR expressions, which is to provide general SQL-like
functionality for joins and declarative restructuring.  A naive mapping of
FLWR expressions to <xsl:foreach /> is not likely to give you an efficient
implementation of joins.

You do give an example that combines a join with distinct(). The XQuery
looks like this:

FOR $p IN distinct(document("bib.xml")//publisher)
LET $a := avg(document("bib.xml")
   /book[publisher = $p]/price)
RETURN 
   <publisher>
      <name> $p/text() </name> ,
      <avgprice> $a </avgprice>
   </publisher>


The equivalent XSLT looks like this:

<xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <xsl:for-each
select="document('bib.xml')//publisher[not(.=preceding::publisher)]">
      <xsl:variable name="prices" 
 
select="document('bib.xml')/book[publisher=current()]/price"/>
      <xsl:variable name="avgPrice" select="sum($prices) div
count($prices)"/>
      <publisher>
        <name><xsl:value-of select="."/></name>
        <avgprice><xsl:value-of select="$avgPrice"/></avgprice>
      </publisher>
    </xsl:for-each>
  </xsl:template>
</xsl:transform>

Again, I find the XQuery solution much easier to read and write. This is the
kind of thing XQuery was designed for. More important, in XQuery, we have
been thinking of database optimization, and I think we will be able to
figure out how to optimize the XQuery equivalent better.


2. Optimizability

A query language needs to be optimizable for queries. To make this possible,
we need to be able to discover equivalences so that queries can be rewritten
flexibly based on the performance parameters of various kinds of access.
Both the XQuery language and the XML Query Algebra are designed to make this
possible.

3. Strong Typing

XQuery will be a strongly typed language. This typing will extend to content
models - a function whose return type is "paragraph element" will return a
valid paragraph element. This level of strong typing is very helpful in
industrial strength programming environments, and difficult to achieve with
the current XSLT. Much of the effort, and much of the justification for the
Query Algebra is achieving strong typing.

In fact, XSLT may benefit from this work. It would be helpful to have
stronger typing in XSLT as well. For instance, I would like to be able to
check whether a given stylesheet will always produce valid HTML 4.0 for a
given DTD. Several people are investigating this - it is much to early to
say whether it can be achieved.

At any rate, I hope this helps explain why I think XQuery is worth
developing as a language, in addition to XSLT.

Jonathan
Received on Thursday, 22 February 2001 11:21:06 UTC