Re: Comments on XPath 2.0 specification

At 05:10 PM 1/21/2002 -0500, Mike Schilling wrote:
>The following message is a courtesy copy of an article
>that has been posted to edgility.eng.xml as well.
>
>I am writing as someone whose software uses XPath 1.0 as a syntax for 
>recognizing and extracting information in XML documents.
>
>I.  Incompatibilities:
>Incompatibilities introduced in XPath 2.0 will cause difficulties for us, 
>in all of the obvious areas:
>
>   Existing XPath strings no longer working correctly.
>
>   Need to retrain customers in XPath 2.0 concepts where they differ from 
> XPath 1.0 concepts.
>
>   Fragility of XPath implementations where they attempt to retain 1.0 
> compatibility in problematical ways.

Obviously, compatibility matters. Unfortunately, XPath 1.0 was never 
designed to be a subset of a type-safe query language, and we felt it was 
very important that XPath 2.0 and XQuery be deeply integrated, and that 
XQuery be integrated with the type system of XML Schema. In general, I 
think we have managed to keep the number of incompatibilities small despite 
the rather large change in functionality and type safety.

Also note that XPath 2.0 provides fallback conversions for the very purpose 
of maintaining compatibility where possible.

>While appendix D listing incompatibilities is extensive, it leaves out two 
>of the most important:
>
>1. Requiring path elements which match keywords to be escaped.
>This is unacceptable, since it makes an unbounded set of existing XPath 
>expression invalid.  It is also unacceptable since it leaves open the 
>possibility that, as future XPath versions will introduce new keywords, 
>yet more XPath expressions become invalid.  In fact, it amounts to a 
>requirement that, for safety, every name in every path expression be 
>escaped.  This is a fundamental change to XPath syntax.

In the current Working Drafts, XPath 2.0 does not reserve its keywords, but 
XQuery does. So the syntax of XPath actually has not been changed in this 
regard. Moreover, since XPath must be unambiguous without reserved 
keywords, and both XPath and XQuery are generated from the same grammar, 
the productions they have in common do not require reserved keywords to be 
parsed unambiguously.

That said, I am quite certain we are not done debating reserved keywords. 
There are strong feelings on both sides of the issue.

>If keywords are required, a much better choice is to make them 
>distinguishable from path elements, i.e. forbid them from matching the XML 
>Name production.  Requiring them to begin with a colon (like the proposed, 
>unacceptable syntax for path element escaping) is one possibility.

This is one of many designs that have been discussed. There are at least 
some people who feel this would make XQuery more awkward to use. Also note 
that this would introduce an incompatibility with XPath 1.0, which has no 
such requirement.

>2. The introduction of the for statement
>This changes XPath from an expression-matching language to a 
>pseudo-procedural one.  It's quite unclear why "for" and "return" are 
>included, but not "if" and "while".

Actually, XPath does include if:

<snip href="http://www.w3.org/TR/xpath20/#id-conditionals">
2.9 Conditional Expressions

XPath supports a conditional expression based on the keywords if, then, and 
else.

[7] IfExpr ::=  "if" "(" Expr ")" "then" Expr "else"  Expr
</snip>

Neither XPath nor XQuery have a while construct.

>  It's also unclear how XPath beneifts from implementing half of the 
> XQuery FLWR statement.

Personally, I agree with you. I don't think we're done discussing this 
particular question, but there was not consensus to put the whole FLWR 
expression in the current Working Drafts, and there was a strong feeling 
from the XSL Working Group that at least "for" and "return" were needed.

>The examples given for "for" in the spec are quite unconvincing, since 
>they describe the sort of transformation which is the province of XSLT and 
>XQuery, not XPath.
>
>This will make user training in XPath far more difficult, since it breaks 
>the existing user view of XPath as a pure pattern-matching language.

We have spent a lot of time agonizing over the proper dividing line between 
XPath and XQuery. There were a number of people who would have preferred 
that XPath 2.0 retain roughly the same functionality as XPath 1.0. However, 
we have seen good examples of stylesheets that could be written in a 
simpler and more straightforward manner by moving more functionality into 
XPath.

>Note that every incompatibility introduces increases the likelihood either 
>that XPath will split into dialects or that XPath 2.0 will simply be 
>rejected.  The history of SQL 3.0 standard is a lesson about the limited 
>ability of a standards effort to make fundamental changes to an existing 
>language.

I see a number of reasonable positions.

At one end of the continuum, we could have said that XPath 2.0 would simply 
be XPath 1.0, with no changes whatsoever. That would mean no support for 
XML Schema types, no additional functionality, and incompatibility with 
XQuery for expressions that look the same in both languages. XQuery would 
then support schema types, provide full query functionality and type 
safety, etc. Advantages: XPath remains quite simple and completely 
compatible with older versions, XQuery can provide full functionality. 
Disadvantages: incompatibilities between XQuery and XPath would be hard for 
users to master.

However, there were requirements for XPath to add significant functionality 
in XPath 2.0 (see http://www.w3.org/TR/xpath20req), and once this 
functionality was added, the differences between XPath and XQuery started 
to fade significantly. It did not seem to make sense to release two new 
languages this similar with very minor differences that lead to 
incompatibilities.

>II. Missing functionality.
>
>Member-wise operations on sequences are both natural and extermely 
>useful.  Take the requirement
>
>         1. Given an XML document containing a purchase order and its
>         line <item> elements, calculate the total amount of the
>         purchase order by summing the price times the quantity of each
>         item. The nodeset is identified by item, and the expression to
>         sum would be price * quantity.

This can be done as follows:

         sum( for $i in items return $i/quantity * $i/USPrice )

So I do not believe this functionality is actually missing. Incidentally, I 
find the plural "items" confusing here, I assume that "items" would contain 
a sequence of "item" elements, in which case the query would be:

         sum( for $i in items/item return $i/quantity * $i/USPrice )

>A natural way to express this, which does not require the for statment, is
>
>         sum(items/quantity * items/USPrice)
>
>In fact, when my co-workers and I were first learning XPath, we had to 
>read the spec carefully to convince ourselves that this wasn't correct.

Consider the following:

         sum(items/item/quantity * items/item/USPrice)

If there can be many "item" elements, we have two sequences of elements 
with a multiply operator between them. How should this be defined? At one 
point, we considered multiplying each item on the left of the operator by 
each item on the right, but we finally decided it should simply be an error.

You suggest the following semantics:

>The semantics are simple: an n-ary operation on n sequences is allowed if 
>all sequences have the same length m, and is interpreted as being done 
>memberwise and resulting in a sequence of m members.  Type exceptions are 
>generated as if the operations are done from first member to m'th member.

The for expression seems to be a more general solution to this problem. 
There are several reasonable ways that multiplication of two sequences 
might be defined, and it is not obvious to me that any one of them is "the" 
right definition. In general, this is a pretty good indication that we 
should let the user define what is actually intended. And that's what "for" 
and "return" do very well.

Personally, I would like to be able to use "let" and "where" in XPath 2.0 
as well.

>  (The restriction is understandable in XPath 1.0, where nodesets cannot 
> be freely constructed.  It is not understandable in 2,0, where sequences 
> construction is explicitly supported.)   It's clear that the current 
> XPath definition of
>
>         items/quantity * items/USPrice
>
>the product of the first node of each nodeset, is useless.  A good XPath 
>expression construction tool will warn the user that it almost certainly 
>is not what is desired.  But why redefine it as an error (as XPath 2.0 
>does) when it has an obvious and useful meaning?

The XPath 1.0 definition was not considered acceptable for a strongly typed 
language like XQuery, and we wanted XQuery and XPath to handle this in the 
same way. Any redefinition results in an incompatibility with XPath 1.0. 
Silently changing the interpretation and reporting no error is more 
dangerous than reporting an error.

Jonathan

These are my opinions right now. They may be quite different from the 
opinions of Software AG, the W3C XML Query Working Group, or the opinions 
that I will have after reading and considering your response.

Received on Tuesday, 22 January 2002 08:41:26 UTC