Re: XPath and XPointer

Paul Prescod wrote:
> It seems to me that XPointer should only
> extend XPath in ways that have been forseen in XPath.

I agree. (In particular, I dislike the way that the "range" and "string"
axes have been bolted onto the XPath framework. XPath's section 2.1 clearly
lays out the semantics of an axis and the syntax of a basis, and neither
"range" nor "string" conforms to *either* of these guidelines.)

On the other hand, I think that the authors of XPath have a responsibility
to foresee reasonable extensions, and allow for them.  To this end, I'll
suggest a modification to the XPath grammar which should make extensions
easier.  At its heart, my suggestion concerns the production for Basis:

[5] Basis ::= Axis Name '::' NodeTest
            | AbbreviatedBasis

To this, I would add a third alternative, "PrimaryExpr". Note that
production 19 *already* allows the first component of a PathExpr to be a
FilterExpr (a PrimaryExpr followed by zero or more Predicates). That is,
roughly speaking, in the first Step of a PathExpr, the Basis can be a
PrimaryExpr.  My suggestion amounts to allowing this in *any* Step of a
PathExpr.

Of course, in a true location path (as opposed to a LocationPath that just
boiled down to a PrimaryExpr), a PrimaryExpr-as-Basis would have to yield a
node-set, which rules out Literal and Number. And after the first Step of a
PathExpr, a VariableReference-as-Basis would be fairly pointless, since it
ignores its context node. However, a parenthesized Expr or a FunctionCall
could be quite useful.

ParenthesizedExpr-as-Basis:
---------------------------

For example, to select the chapter and appendix children of a book-document,
you currently need
        /child::book/child::*[self::chapter or self::appendix]
which you can abbreviate somewhat to
        /book/*[self::chapter or self::appendix]
If we allow a PrimaryExpr-as-Basis in the second step, we can rewrite the
full expression as
        /child::book/(child::chapter|child::appendix)
which would then abbreviate to
        /book/(child|appendix)
which I suspect would please the XQL folks.

Or what if you wanted to select the titles of chapters and top-level
sections (for a table of contents, maybe). Currently, you'd write
(abbreviated):
        /book/chapter/title | /book/chapter/section/title
or else
        (/book/chapter | /book/chapter/section)/title
but with my suggestion, you could write
        /book/(chapter|chapter/section)/title
or even
        /book/chapter/(.|section)/title

With a ParenthesizedExpr-as-Basis, the main benefit appears to be increased
conciseness with respect to "|", the only node-set operator per se. For
notations (such as XQL) that introduce more node-set operators, the
conciseness would presumably extend to them too.

FunctionCall-as-Basis:
----------------------

In XPath, there's only one function that yields a node-set, namely id(), and
it mostly ignores its context node (except to get its containing document),
so it really only makes sense in the first Step of a PathExpr. However, if
XPath were to allow FunctionCalls as the basis of later steps, this would
provide a "hook" for easy extension.  Specifically, XPointer could introduce
"range" and "string" as functions, rather than (non-XPath-conforming) axes.

At first glance, you might rewrite
        range::L1,L2
as
        range(L1,L2)
However, the XPointer draft specifies that L2 should be evaluated in the
context(s) yielded by L1, whereas XPath specifies (implicitly) that the
arguments to a FunctionCall are all evaluated in the same context as the
FunctionCall. Therefore, to obtain the same semantics, you must rewrite
        range::L1,L2
as
        L1/range(.,L2)

For instance, the example from XPointer section 5.5.3:
        range:: descendant::REVST, following::REVEND[1]
would become
        descendant::REVST/range(.,following::REVEND[1])

The example from XPointer section 5.2.1:
        range:: id("a23")/child::*[1], following-sibling::*[2]
would become
        id("a23")/child::*[1]/range(.,following-sibling::*[2])
or equivalently (and more straightforwardly)
        id("a23")/range(child::*[1],child::*[3])

The definition of the "range" function would be something like this (using
capitals in lieu of bold):

        Function: range RANGE(node-set, node-set)
        The RANGE function returns a range starting at the beginning
        of the data selected by its first argument and continuing
        through to the end of the data selected by its second argument.
        The value of each argument must be a singleton set.

Notes:
(1) Constraining the arguments to be singleton sets is equivalent to        
    (but more simply expressed than) the prohibition set out in the last
    paragraph of XPointer section 5.5.3:
        Multiple locations from a single member of the first argument,
        are prohibited for the second argument of the range axis,
        on grounds of simplicity.

    If some future extender wished to allow non-singleton arguments
    (returning a range for each pair in the cross-product, say) this
    would be much easier to express and provide as a function than as
    an axis, assuming FunctionCall-as-Basis were allowed.

(2) Having to provide a signature for the function forces the function
    authors to specify argument types and a return type, which are lacking
    in the current spec. Hopefully, this would then indicate that we need
    a more precise definition of a range, and discussion of how it relates
    to the four basic data types of XPath.

    If a future extender wished (say) to locate sub-resources within ranges
    (e.g., "every occurrence of a DEF element within a REVST/REVEND span"),
    this would be much easier given a solid semantic framework for ranges
    (and range-sets, probably), and also given the flexible syntactic
    framework of FunctionCall-as-Basis.

I could make a similar argument about the "string" axis, but you get the
idea. 

Collateral Grammar Changes:
---------------------------

Production 19 would become simply

        [19] PathExpr  ::=  LocationPath

(since LocationPath would now subsume the other alternatives), whereupon you
could just collapse the two symbols and eliminate the production.

Production 20 would vanish, since production 4 would cover it.

-Michael Dyck
 jmdyck@netcom.ca

Received on Friday, 13 August 1999 03:13:55 UTC