W3C home > Mailing lists > Public > www-ql@w3.org > April to June 2004

RE: Question regarding order in XPath

From: Michael Kay <mhk@mhk.me.uk>
Date: Fri, 21 May 2004 23:50:49 +0100
To: "'Murali Mani'" <mani@CS.UCLA.EDU>
Cc: <www-ql@w3.org>
Message-Id: <20040521225127.141E5A087E@frink.w3.org>

> 
> My concern is regarding the statement that "even reverse axes 
> will return
> results in document order". Let us consider something like:
> 
> a [2 TO 5]/b

This query probably doesn't mean what you think it means. Firstly, the
keyword has to be lower case. The value of the expression (2 to 5) is the
sequence (2, 3, 4, 5), and the effective boolean value of a sequence of four
integers is always true. I suspect you mean

a[position() = 2 to 5]/b

where the predicate is true for any a whose position is equal to one of the
values in the sequence (2,3,4,5).
> 
> Now suppose "a" uses a reverse axis. Now because of the range 
> predicate,
> we will get only results that are produced by a, but have 
> positions 2 TO 5
> in reverse document order, right??

Yes.
> 
> Now we again start getting results in document order when we start
> navigating etc.. right??

Yes.
> 
> My question is: Why do we define order based on document order, rather
> than the order of previous step??

There are two separate questions here.

Firstly, any path expression (that is, any expression using the "/"
operator) returns results in document order, regardless of the order of the
sequences in its operands.

Secondly, an axis step (such as a[position() = 2 to 5]) always produces
results in document order, regardless of the axis.

The rationale is a long story. It is partly done this way in XPath 2.0 to
provide compatibility with 1.0. But it is also done this way because if you
do it differently you get some very strange anomalies especially for
expressions that use the recursive axes such as descendant. There is some
discussion of these points in Don Chamberlin's chapter of the "XQuery from
the Experts" book.

A lot of people from the database side of the fence wanted to make "/" into
a sequence mapping operator, but this fails on documents with recursive
structures. Recursive queries have traditionally been really difficult with
languages based on predicate calculus, but they are routine with hierarchic
data and with narrative XML in particular.
> 
> The thing is document order and order based on previous step 
> are identical
> for forward axes, reverse axis with a position predicate 
> (rather than a
> range predicate) etc..
> 
> Were these options for defining order discussed by the WG? and any
> insights into the reasoning process would be greatly appreciated.

They were discussed at immense length.
> 
> best, murali.
> 
> Note: Also, I cannot think of any use case where after a step using a
> reverse axis, we want to revert back to document order (if 
> document order
> and the "previous-step-order" yield different results)..
>
It's too late on a Friday night to go into all the detail. As I say, part of
it is simply backwards compatibility. At XSLT 1.0 people quite reasonably
write:

<xsl:for-each select="ancestor::*">
  <xsl:value-of select="name()"/>
  <xsl:if test="position()!=last()">/</xsl:if>
</xsl:for-each>

and one can't really change the meaning of such an everyday construct in
version 2 of a language.

But you also get into all sorts of problems if you try to treat "/" as a
sequence mapping operator, because of the recursive axes. The classic
example was if you start with the element:

<p>Do <b>not</b> switch off the computer, it will <b>explode</b></p>

and then do

<p><xsl:copy-of select="p//text()" /></p>

then the result if you don't sort into document order is

<p>Do switch off the computer, it will not explode</p>

The WG also looked for ways of getting round this problem by making
p//text() mean something other than
child::p/descendant-or-self::node()/child::text(), but no alternatives
actually worked.

Michael Kay
Received on Friday, 21 May 2004 18:51:27 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 22 July 2006 00:10:19 GMT