XQuery Grammar is not LL(1) from Michael Dyck on 2001-06-22 (www-xml-query-comments@w3.org from June 2001)

From: Michael Dyck <MichaelDyck@home.com>
Date: Thu, 21 Jun 2001 17:40:58 -0700
To: www-xml-query-comments@w3.org
Message-ID: <3B32941A.637D6418@home.com>
XQuery 1.0: An XML Query Language
W3C Working Draft 07 June 2001

Appendix B ("XQuery Grammar")
http://www.w3.org/TR/xquery/#section-XQuery-Grammar
claims that the grammar is LL(1). I disagree.

For one thing, the grammar as defined by the EBNF is ambiguous (re Expr),
and no ambiguous grammar can be LL(1). Of course, the precedence chart is
intended to disambiguate the EBNF, so the claim must be that the grammar as
defined by the EBNF and the precedence chart is LL(1). The problem then is
that LL(1)-ness isn't defined for such a grammar.

If the intent is that the precedence rules transform the ambiguous EBNF into
unambiguous EBNF (or BNF), for which a claim of LL(1)-ness would be
meaningful, then you'd have to say what those transformations are. If, for
instance,

    AdditiveExpr ::= Expr ("+" | "-") Expr

were to transform to

    AdditiveExpr ::= AdditiveExpr ("+" | "-" ) MultiplicativeExpr
                   | MultiplicativeExpr

(which is roughly how the previous draft expressed it), this would be
left-recursive, and definitely not LL(1).
__________

Even if you took care of the Expr ambiguity, there's another one.
Because keywords are not reserved, PrimaryExpr generates these forms:
    processing-instruction( StringLiteral )
    processing-instruction()
    comment()
    text()
    node()
via both NodeTest (KindTest) and FunctionCall. Mind you, this is easy to
fix: just eliminate KindTest from the NodeTest production, and delete
productions 42 through 46. (In fact, I'd recommend this even if keywords
*were* reserved, and the ambiguity didn't technically exist. It seems
pointless to single out these forms and say they're *not* function calls.)
__________

Even if you eliminate (or ignore) the ambiguities, there are a number of
specific productions for which an LL(1) parser would experience a conflict:

Production 33:
   StepExpr ::= AxisStepExpr | OtherStepExpr
has a conflict when the next symbol is a name (e.g., "child"): it might
begin an AxisStepExpr (e.g., "child::section"), or it might begin an
OtherStepExpr (e.g., "child[1]"). (Unless the lexer can magically
distinguish between an NCName and a colon-less QName.)

Production 38:
   PrimaryExpr ::= ... | NodeTest | ... | FunctionCall | ...
also has a conflict when the next symbol is a name (e.g., NodeTest "foo" vs
FunctionCall "foo(1)")

Production 7:
    Expr ::= .....
(no matter how you resolve the problem of ambiguity) has a conflict between
PathExpr and any of the Exprs that begin with a keyword (FLWRExpr, IfExpr,
SomeExpr, EveryExpr, TypeSwitchExpr), when the next symbol is a word that
looks like one of those keywords. (E.g. the next symbol "let" might begin a
FLWRExpr "let $x := ..." or a PathExpr "let/lessor".)

I think most of these conflicts can be resolved with two symbols of
lookahead, but I've found one nasty case: if an Expr begins with
    if ( Expr )
then there is a conflict between IfExpr and PathExpr (FunctionCall) that can
only be resolved by the symbol after the closing paren. Since the embedded
Expr can be arbitrarily long, no fixed amount of lookahead will resolve the
conflict. That is, the grammar isn't LL(k) for *any* k.
__________

My advice: unless you design the language to be LL(1), don't try to shoehorn
it into an LL(1) grammar.

-Michael Dyck
Received on Thursday, 21 June 2001 20:48:20 UTC