- From: Evan Lenz <elenz@xyzfind.com>
- Date: Fri, 23 Feb 2001 06:40:20 -0800 (PST)
- To: Jonathan.Robie@SoftwareAG-USA.com, xml-dev@lists.xml.org
- Cc: www-ql@w3.org, www-xml-query-comments@w3.org
It seems that by continuing to argue about syntax, we will just keep banging our heads on the wall. I want to focus on semantics. XQuery's FOR clause generates a "list of binding-tuples," whereas XSLT's for-each instruction generates a "current node list". In both cases, each "binding-tuple" (XQuery) or "current node" (XSLT), is taken from an "ordered forest" (XQuery) or a "node-set" (XPath) in document order. For each tuple (XQuery), or current node (XSLT), in the FOR clause (XQuery), or for-each instruction (XSLT), the subsequent statements are executed once, including, in XQuery, any additional "binding-tuples" which produce a "cross-product" of these bindings, or in XSLT, the nesting of for-each instructions. XQuery's LET clause binds variables in exactly the same way as XSLT's variable instruction. XQuery's WHERE clause filters nodes from being constructed, given a particular condition, in exactly the same way as XSLT's if instruction. And, finally, XQuery's RETURN clause returns arbitrarily constructed elements and attributes in the same was as XSLT's literal result elements, deep copies of nodes in the same way as XSLT's copy-of instruction, shallow copies of nodes in the same way as XSLT's copy instruction, processing instructions in the same way as XSLT's processing-instruction instruction, comments in the same way as XSLT's comment instruction, etc. Apart from terminology, the most significant difference between these models is that XSLT has at its disposal a current node, whereas XQuery does not have this implicit notion of context. Thus, the path expressions in XQuery are always absolute, beginning with something like $foo. While it is certainly possible in XSLT to always bind $foo to the current node, it is not usually necessary (except in complex joins). Thus, instead of $foo/bar, you would just type bar, assuming $foo is the current node. The current node is used as XPath's "context node", which is part of the context that is always supposed to be defined for XPath expressions. (The XQuery spec fails to even address the context of path expression evaluation in the terms defined by the XPath specification. Note that XPointer, the other technology that uses XPath, explicitly addresses and conforms to XPath's defined evaluation context -- http://www.w3.org/TR/xptr#context, as does, of course, XSLT.) There is absolutely nothing about the notion of a current node that should affect implementability or optimizability in comparison to only using binding-tuples. These two things are essentially the same thing; the difference is in what we call them, and in how we can access them. Again, in XQuery, you always have to specify $foo/bar; in XSLT you can just say bar. That's why a number of my XSLT examples (especially the ones that are not awkwardly specified due to a currently missing feature of the XSLT language) do not use as many variables as the XQuery examples. It perhaps would have been easier to see the mapping if I had always used variables, but part of the point was to demonstrate the use of XSLT's current node, which allows you to specify relative XPath expressions. Remember, these are not patterns; these expressions are just as determinate as XQuery expressions. The only difference is in how they may be specified. XQuery's need for an XPath evaluation context seems best met by XSLT's concept of a current node. In XQuery, you could perhaps allow this without changing the syntax of the current FLWR expression examples. For example, you could still have: FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998" RETURN $b/title But you should also be able to specify the same query like so: FOR . IN document("bib.xml")//book WHERE publisher = "Morgan Kaufmann" AND year = "1998" RETURN title The use of a current node would bring XQuery closer to XPath conformance by providing it with a context node, and it would remove the primary semantic difference between XSLT's down-reference pull and FLWR expressions, while having zero effect on optimizability. The difference between the above two queries should vanish once the most basic query processor is finished with them. With regard to XSLT's semantics, the mapping to the second query shown above is obvious. The first query would just be shorthand for <xsl:for-each> plus <xsl:variable name="b" select="."/>. Or, expanded out in XQuery syntax: FOR . IN document("bib.xml")//book LET $b := . WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998" RETURN $b/title For joins, you would need the use of variables, because once you leave one FOR statement, you've lost that current node. To retain it, you need to bind it to a variable. (Again, XSLT provides all that's needed for joins too, by using <xsl:for-each> and <xsl:variable>.) Here's an example of a join from Section 3 of the XQuery spec: FOR $sp IN document("sp.xml")//sp_tuple, $p IN document("p.xml")//p_tuple[pno = $sp/pno] $s IN document("s.xml")//s_tuple[sno = $sp/sno] RETURN <sp_pair> $s/sname , $p/descrip </sp_pair> SORTBY (sname, descrip) But even in this example, not all variable references are needed: FOR $sp IN document("sp.xml")//sp_tuple, $p IN document("p.xml")//p_tuple[pno = $sp/pno], . IN document("s.xml")//s_tuple[sno = $sp/sno] RETURN <sp_pair> sname , $p/descrip </sp_pair> SORTBY (sname, descrip) I recently began writing an XSLT-to-XQuery converter (working only, of course, on XSLT that exclusively uses down-reference pull), and the biggest barrier to completion was the need to generate extraneous variable references for the current node. The effect of this is that, for me to perform the conversion, I have to parse every XPath expression, identifying all relative expressions and subexpressions and prefixing them with $foo/, where $foo is just a variable reference for the current node. It appears that the XQuery spec may already have a mechanism for this, with the use of the "." syntax. But my reading of the algebra mapping makes it appear as if they've got it backwards with respect to XPath's rules. "." should always stand for the *context* node, and only incidentally the current node when you're not already inside an XPath expression. The following note confuses me: "Local XPath predicates correspond to iteration over element in a collection, and requires the binding of the dot variable, as the predicate might use the current node." (Appendix E.2.1) First of all, I assume the "current node" is meant to be something like XSLT's current node, though it's never defined in the spec. The "dot variable" elsewhere implies a "." syntax. If "." is used to access the current node (contrary to the XPath standard), how then do I access the context node from inside a predicate? XSLT provides access to the current node within a predicate via a current() function. Here is another instance where I think XQuery should learn from XSLT. Each of the issues I've addressed here should have no effect on optimizability. I don't pretend to know how the W3C XML Query Working Group came up with their algebra, but so much of it is like XPath/XSLT, but just not quite the same thing. And this "not quite" part is what I'm most worried about. But, so far, the "not quite" parts that I've identified here should have no impact on optimizability whatsoever. Evan Lenz XYZFind Corp.
Received on Friday, 23 February 2001 09:40:24 UTC