Comments on XPath 2.0 specification

The following message is a courtesy copy of an article
that has been posted to edgility.eng.xml as well.

I am writing as someone whose software uses XPath 1.0 as a syntax for 
recognizing and extracting information in XML documents.

I.  Incompatibilities:
Incompatibilities introduced in XPath 2.0 will cause difficulties for 
us, in all of the obvious areas:

   Existing XPath strings no longer working correctly.

   Need to retrain customers in XPath 2.0 concepts where they differ 
from XPath 1.0 concepts.

   Fragility of XPath implementations where they attempt to retain 1.0 
compatibility in problematical ways.

While appendix D listing incompatibilities is extensive, it leaves out 
two of the most important:

1. Requiring path elements which match keywords to be escaped.
This is unacceptable, since it makes an unbounded set of existing XPath 
expression invalid.  It is also unacceptable since it leaves open the 
possibility that, as future XPath versions will introduce new keywords, 
yet more XPath expressions become invalid.  In fact, it amounts to a 
requirement that, for safety, every name in every path expression be 
escaped.  This is a fundamental change to XPath syntax.

If keywords are required, a much better choice is to make them 
distinguishable from path elements, i.e. forbid them from matching the 
XML Name production.  Requiring them to begin with a colon (like the 
proposed, unacceptable syntax for path element escaping) is one possibility.

2. The introduction of the for statement
This changes XPath from an expression-matching language to a 
pseudo-procedural one.  It's quite unclear why "for" and "return" are 
included, but not "if" and "while".  It's also unclear how XPath 
beneifts from implementing half of the XQuery FLWR statement.  The 
examples given for "for" in the spec are quite unconvincing, since they 
describe the sort of transformation which is the province of XSLT and 
XQuery, not XPath.

This will make user training in XPath far more difficult, since it 
breaks the existing user view of XPath as a pure pattern-matching language.

Note that every incompatibility introduces increases the likelihood 
either that XPath will split into dialects or that XPath 2.0 will simply 
be rejected.  The history of SQL 3.0 standard is a lesson about the 
limited ability of a standards effort to make fundamental changes to an 
existing language.

II. Missing functionality.

Member-wise operations on sequences are both natural and extermely 
useful.  Take the requirement

	1. Given an XML document containing a purchase order and its
	line <item> elements, calculate the total amount of the
	purchase order by summing the price times the quantity of each
	item. The nodeset is identified by item, and the expression to
	sum would be price * quantity.


taken from

	section 2.5:  Should Support Aggregation Functions Over
	Collection-Valued Expressions

in

	http://www.w3.org/TR/xpath20req#section-Requirements

A sample document fragment might be

<items>
   <item partNum="872-AA">
     <productName>Lawnmower</productName>
     <quantity>1</quantity>
     <USPrice>148.95</USPrice>
     <comment>Confirm this is electric</comment>
   </item>
   <item partNum="926-AA">
     <productName>Baby Monitor</productName>
     <quantity>1</quantity>
     <USPrice>39.98</USPrice>
     <shipDate>1999-05-21</shipDate>
   </item>
</items>
	
A natural way to express this, which does not require the for statment, is

	sum(items/quantity * items/USPrice)

In fact, when my co-workers and I were first learning XPath, we had to 
read the spec carefully to convince ourselves that this wasn't correct. 
     (The restriction is understandable in XPath 1.0, where nodesets 
cannot be freely constructed.  It is not understandable in 2,0, where 
sequences construction is explicitly supported.)   It's clear that the 
current XPath definition of

	items/quantity * items/USPrice

the product of the first node of each nodeset, is useless.  A good XPath 
expression construction tool will warn the user that it almost certainly 
is not what is desired.  But why redefine it as an error (as XPath 2.0 
does) when it has an obvious and useful meaning?

The semantics are simple: an n-ary operation on n sequences is allowed 
if all sequences have the same length m, and is interpreted as being 
done memberwise and resulting in a sequence of m members.  Type 
exceptions are generated as if the operations are done from first member 
to m'th member.

Received on Monday, 21 January 2002 19:58:52 UTC