[XQuery] IBM-XQ-007: Last step in a path expression

(IBM-XQ-007) Section 3.2 (Path Expressions): The definition of a path 
expression should be revised to remove the restriction that the expression 
on the right side of "/" must return a sequence of nodes. The restriction 
should be retained for the expression on the left side of "/". In effect, 
this would permit the last step in a path to return one or more atomic 
values. This feature has recently been requested by Sarah Wilkin (
http://lists.w3.org/Archives/Public/public-qt-comments/2004Feb/0100.html) 
who proposes the following rule: When evaluating E1/E2, if each evaluation 
of E2 returns a sequence of nodes, they are combined in document order, 
removing duplicates; if each evaluation of E2 returns a sequence of atomic 
values, the sequences are concatenated in the order generated; otherwise a 
type error is raised. Like all type errors, this error can be raised 
either statically or dynamically, depending on the implementation. This 
rule provides well-defined static and dynamic semantics for path 
expressions.

To illustrate the usability advantages of this proposal, consider a 
document containing "employee" elements, each of which has child elements 
"dept", "salary", and "bonus". To find the largest total pay (salary + 
bonus) of all the employees in the Toy department, here is what I think 
many users will write:

max( //employee[dept = "Toy"]/(salary + bonus) )

Unfortunately in our current language this is an error because the final 
step in the path does not return a sequence of nodes. The user is forced 
to write the following:

max( for $e in //employee[dept = "Toy"] return ($e/salary + $e/bonus) )

This expression is complex and error-prone (users will forget the 
parentheses or will forget to use the bound variables inside the return 
clause). There is no reason why this query cannot be expressed in a more 
straightforward way. Users will try to write it as a path expression and 
will not understand why it fails.

Another very common example is the use of data() to extract the typed 
value from the last step in a path, as in this case: 
//book[isbn="1234567"]/price/data().  This very reasonable expression is 
also an error and the user is forced to write 
data(//book[isbn="1234567"]/price).

Note that I am NOT asking for a general-purpose mapping operator, which I 
think is not in general needed since we already have a for-expression. 
Instead, I think we should simply relax the unnatural and unnecessary 
restriction that is currently placed on path expressions. This will remove 
a frequent source of errors and will improve the usefulness of path 
expressions, without precluding us from introducing a general-purpose 
mapping operator later if a consensus emerges to do so.

--Don Chamberlin

Received on Wednesday, 11 February 2004 18:50:54 UTC