problem: PathExpr includes PrimaryExpr

XQuery 1.0: An XML Query Language
W3C Working Draft 20 December 2001

Consider the following statements:

Section 2.3 para 1:
"A path expression locates nodes within a tree, and returns a sequence of
distinct nodes in document order."

Section 2.3 para 2:
"Each step [in a relative path expression] selects a sequence of nodes."

Section 2.3.2 para 3:
"The expression in the general step must evaluate to a sequence of nodes.
The result of the general step is this sequence of nodes, modified by
any step qualifiers and sorted if necessary into document order. If the
expression in the general step returns any values that are not nodes,
an error is raised."


The problem with these statements is that the syntactic classes of
PathExpr, RelativePathExpr, StepExpr, and GeneralStep all include 
primaries such as

    3.14
    "foo"
    (2+2)
    xf:date("2002-01-12")
    $a_sequence_of_nodes_not_in_doc_order

The quoted statements require that each of these yields a sequence of
distinct nodes in document order, which is certainly not what you want.


One way to solve this problem would be to change the wording of the
statements. For example, in the first one, you could replace
    "A path expression"
with
    "A path expression whose top-level operator is a Slash or SlashSlash"
or
    "A path expression that is more than just a primary".
(Note that XPath 1.0 <http://www.w3.org/TR/xpath.html#booleans> uses
similar phrasing.) However, this seems kludgey and error-prone to me.


Instead, I think you should change the grammar to provide more useful
syntactic terms. In fact, there is an anomaly in the Expr hierarchy --
from SortExpr down to PathExpr, the EBNF is ambiguous, with the precedence
given by the table in A.2, but from PathExpr down to PrimaryExpr, the
precedence is made explicit in the EBNF. If you rewrite the latter in the
same style as the former, you get syntactic classes which are more useful
to making statements such as the ones quoted above.

-------

Specifically, replace this:

[ 4] Expr             ::= ... | PathExpr
[25] PathExpr         ::= AbsolutePathExpr | RelativePathExpr
[31] AbsolutePathExpr ::= (Slash RelativePathExpr?) | (SlashSlash RelativePathExpr)
[32] RelativePathExpr ::= StepExpr ((Slash | SlashSlash) StepExpr)*
[33] StepExpr         ::= AxisStep | GeneralStep
[34] AxisStep         ::= (Axis NodeTest StepQualifiers) | AbbreviatedStep
[44] GeneralStep      ::= PrimaryExpr StepQualifiers
[45] AbbreviatedStep  ::= Dot | DotDot | (At NameTest StepQualifiers)
[46] StepQualifiers   ::= ((Lbrack Expr Rbrack) | (Arrow NameTest))*

with this:

    Expr ::= ...
             | BareSlashExpr
             | UnarySlashingExpr
             | BinarySlashingExpr
             | Dot
             | DotDot
             | PredicatedExpr
             | DereferenceExpr
             | BasicExpr
    BareSlashExpr      ::= Slash
    UnarySlashingExpr  ::=      (Slash|SlashSlash) Expr
    BinarySlashingExpr ::= Expr (Slash|SlashSlash) Expr
    PredicatedExpr     ::= Expr Lbrack Expr Rbrack
    DereferenceExpr    ::= Expr Arrow NameTest
    BasicExpr          ::= Axis NodeTest | At NameTest | PrimaryExpr

    Precedence:
        ...
        Slash
        UnarySlashingExpr, BinarySlashingExpr
        Dot, DotDot
        PredicatedExpr, DereferenceExpr
        BasicExpr

or, if you want to fix the Dot and DotDot "bug", replace it with this:

    Expr ::= ...
             | BareSlashExpr
             | UnarySlashingExpr
             | BinarySlashingExpr
             | PredicatedExpr
             | DereferenceExpr
             | BasicExpr
    BareSlashExpr      ::= Slash
    UnarySlashingExpr  ::=      (Slash|SlashSlash) Expr
    BinarySlashingExpr ::= Expr (Slash|SlashSlash) Expr
    PredicatedExpr     ::= Expr Lbrack Expr Rbrack
    DereferenceExpr    ::= Expr Arrow NameTest
    BasicExpr          ::= Axis NodeTest | At NameTest | Dot | DotDot
                           | PrimaryExpr
    Precedence:
        ...
        Slash
        UnarySlashingExpr, BinarySlashingExpr
        PredicatedExpr, DereferenceExpr
        BasicExpr

(Personally, rather than define BasicExpr as such, I would just expand
PrimaryExpr to include the other alternatives, but I wanted to keep the
connection to the existing grammar clear.)

-------

With such a grammar, you can say, for instance:
    Every BareSlashExpr, UnarySlashingExpr, and BinarySlashingExpr
    locates nodes within a tree, and returns a sequence of distinct
    nodes in document order.
without fear of tarring other classes of expression with the same brush.

Of course, if you frequently found yourself using those three classes
together, you could introduce a nonterminal to cover them:
    SlashingExpr ::= BareSlashExpr
                     | UnarySlashingExpr
                     | BinarySlashingExpr
Thus:
    Every SlashingExpr locates nodes within a tree, and returns
    a sequence of distinct nodes in document order.

-Michael Dyck

Received on Friday, 1 February 2002 01:27:03 UTC