XQuery's FLWR expression vs. XSLT down-reference pull

It seems that by continuing to argue about syntax, we will just keep
banging our heads on the wall. I want to focus on semantics.

XQuery's FOR clause generates a "list of binding-tuples," whereas XSLT's
for-each instruction generates a "current node list". In both cases, each
"binding-tuple" (XQuery) or "current node" (XSLT), is taken from an
"ordered forest" (XQuery) or a "node-set" (XPath) in document order. For
each tuple (XQuery), or current node (XSLT), in the FOR clause (XQuery),
or for-each instruction (XSLT), the subsequent statements are executed
once, including, in XQuery, any additional "binding-tuples" which produce
a "cross-product" of these bindings, or in XSLT, the nesting of for-each
instructions. XQuery's LET clause binds variables in exactly the same way
as XSLT's variable instruction. XQuery's WHERE clause filters nodes from
being constructed, given a particular condition, in exactly the same way
as XSLT's if instruction. And, finally, XQuery's RETURN clause returns
arbitrarily constructed elements and attributes in the same was as XSLT's
literal result elements, deep copies of nodes in the same way as XSLT's
copy-of instruction, shallow copies of nodes in the same way as XSLT's
copy instruction, processing instructions in the same way as XSLT's
processing-instruction instruction, comments in the same way as XSLT's
comment instruction, etc.

Apart from terminology, the most significant difference between these
models is that XSLT has at its disposal a current node, whereas XQuery
does not have this implicit notion of context. Thus, the path expressions
in XQuery are always absolute, beginning with something like $foo. While
it is certainly possible in XSLT to always bind $foo to the current node,
it is not usually necessary (except in complex joins). Thus, instead of
$foo/bar, you would just type bar, assuming $foo is the current node. The
current node is used as XPath's "context node", which is part of the
context that is always supposed to be defined for XPath expressions. (The
XQuery spec fails to even address the context of path expression
evaluation in the terms defined by the XPath specification. Note that
XPointer, the other technology that uses XPath, explicitly addresses and
conforms to XPath's defined evaluation context --
http://www.w3.org/TR/xptr#context, as does, of course, XSLT.)

There is absolutely nothing about the notion of a current node that should
affect implementability or optimizability in comparison to only using
binding-tuples. These two things are essentially the same thing; the
difference is in what we call them, and in how we can access them. Again,
in XQuery, you always have to specify $foo/bar; in XSLT you can just say
bar. That's why a number of my XSLT examples (especially the ones that are
not awkwardly specified due to a currently missing feature of the XSLT
language) do not use as many variables as the XQuery examples. It perhaps
would have been easier to see the mapping if I had always used variables,
but part of the point was to demonstrate the use of XSLT's current node,
which allows you to specify relative XPath expressions. Remember, these
are not patterns; these expressions are just as determinate as XQuery
expressions. The only difference is in how they may be specified.

XQuery's need for an XPath evaluation context seems best met by XSLT's
concept of a current node.

In XQuery, you could perhaps allow this without changing the syntax of the
current FLWR expression examples.

For example, you could still have:

FOR $b IN document("bib.xml")//book
WHERE $b/publisher = "Morgan Kaufmann"
AND $b/year = "1998"
RETURN $b/title

But you should also be able to specify the same query like so:

FOR . IN document("bib.xml")//book
WHERE publisher = "Morgan Kaufmann"
AND year = "1998"
RETURN title


The use of a current node would bring XQuery closer to XPath conformance
by providing it with a context node, and it would remove the primary
semantic difference between XSLT's down-reference pull and FLWR
expressions, while having zero effect on optimizability. The difference
between the above two queries should vanish once the most basic query
processor is finished with them.

With regard to XSLT's semantics, the mapping to the second query shown
above is obvious. The first query would just be shorthand for
<xsl:for-each> plus <xsl:variable name="b" select="."/>. Or, expanded out
in XQuery syntax:

FOR . IN document("bib.xml")//book
LET $b := .
WHERE $b/publisher = "Morgan Kaufmann"
AND $b/year = "1998"
RETURN $b/title


For joins, you would need the use of variables, because once you leave one
FOR statement, you've lost that current node. To retain it, you need to
bind it to a variable. (Again, XSLT provides all that's needed for joins
too, by using <xsl:for-each> and <xsl:variable>.) Here's an example of a
join from Section 3 of the XQuery spec:

FOR $sp IN document("sp.xml")//sp_tuple,
    $p IN document("p.xml")//p_tuple[pno = $sp/pno]
    $s IN document("s.xml")//s_tuple[sno = $sp/sno]
RETURN
   <sp_pair>
      $s/sname ,
      $p/descrip
   </sp_pair> SORTBY (sname, descrip)


But even in this example, not all variable references are needed:

FOR $sp IN document("sp.xml")//sp_tuple,
    $p IN document("p.xml")//p_tuple[pno = $sp/pno],
    . IN document("s.xml")//s_tuple[sno = $sp/sno]
RETURN
   <sp_pair>
      sname ,
      $p/descrip
   </sp_pair> SORTBY (sname, descrip)


I recently began writing an XSLT-to-XQuery converter (working only, of
course, on XSLT that exclusively uses down-reference pull), and the
biggest barrier to completion was the need to generate extraneous variable
references for the current node. The effect of this is that, for me to
perform the conversion, I have to parse every XPath expression,
identifying all relative expressions and subexpressions and prefixing them
with $foo/, where $foo is just a variable reference for the current node.

It appears that the XQuery spec may already have a mechanism for this,
with the use of the "." syntax. But my reading of the algebra mapping
makes it appear as if they've got it backwards with respect to XPath's
rules. "." should always stand for the *context* node, and only
incidentally the current node when you're not already inside an XPath
expression. The following note confuses me:

"Local XPath predicates correspond to iteration over element in a
collection, and requires the binding of the dot variable, as the predicate
might use the current node." (Appendix E.2.1)

First of all, I assume the "current node" is meant to be something like
XSLT's current node, though it's never defined in the spec. The "dot
variable" elsewhere implies a "." syntax. If "." is used to access the
current node (contrary to the XPath standard), how then do I access the
context node from inside a predicate? XSLT provides access to the current
node within a predicate via a current() function. Here is another instance
where I think XQuery should learn from XSLT.

Each of the issues I've addressed here should have no effect on
optimizability. I don't pretend to know how the W3C XML Query Working
Group came up with their algebra, but so much of it is like XPath/XSLT,
but just not quite the same thing. And this "not quite" part is what I'm
most worried about. But, so far, the "not quite" parts that I've
identified here should have no impact on optimizability whatsoever.

Evan Lenz
XYZFind Corp.

Received on Friday, 23 February 2001 09:40:24 UTC