- From: Michael Dyck <MichaelDyck@home.com>
- Date: Thu, 21 Jun 2001 17:40:58 -0700
- To: www-xml-query-comments@w3.org
XQuery 1.0: An XML Query Language W3C Working Draft 07 June 2001 Appendix B ("XQuery Grammar") http://www.w3.org/TR/xquery/#section-XQuery-Grammar claims that the grammar is LL(1). I disagree. For one thing, the grammar as defined by the EBNF is ambiguous (re Expr), and no ambiguous grammar can be LL(1). Of course, the precedence chart is intended to disambiguate the EBNF, so the claim must be that the grammar as defined by the EBNF and the precedence chart is LL(1). The problem then is that LL(1)-ness isn't defined for such a grammar. If the intent is that the precedence rules transform the ambiguous EBNF into unambiguous EBNF (or BNF), for which a claim of LL(1)-ness would be meaningful, then you'd have to say what those transformations are. If, for instance, AdditiveExpr ::= Expr ("+" | "-") Expr were to transform to AdditiveExpr ::= AdditiveExpr ("+" | "-" ) MultiplicativeExpr | MultiplicativeExpr (which is roughly how the previous draft expressed it), this would be left-recursive, and definitely not LL(1). __________ Even if you took care of the Expr ambiguity, there's another one. Because keywords are not reserved, PrimaryExpr generates these forms: processing-instruction( StringLiteral ) processing-instruction() comment() text() node() via both NodeTest (KindTest) and FunctionCall. Mind you, this is easy to fix: just eliminate KindTest from the NodeTest production, and delete productions 42 through 46. (In fact, I'd recommend this even if keywords *were* reserved, and the ambiguity didn't technically exist. It seems pointless to single out these forms and say they're *not* function calls.) __________ Even if you eliminate (or ignore) the ambiguities, there are a number of specific productions for which an LL(1) parser would experience a conflict: Production 33: StepExpr ::= AxisStepExpr | OtherStepExpr has a conflict when the next symbol is a name (e.g., "child"): it might begin an AxisStepExpr (e.g., "child::section"), or it might begin an OtherStepExpr (e.g., "child[1]"). (Unless the lexer can magically distinguish between an NCName and a colon-less QName.) Production 38: PrimaryExpr ::= ... | NodeTest | ... | FunctionCall | ... also has a conflict when the next symbol is a name (e.g., NodeTest "foo" vs FunctionCall "foo(1)") Production 7: Expr ::= ..... (no matter how you resolve the problem of ambiguity) has a conflict between PathExpr and any of the Exprs that begin with a keyword (FLWRExpr, IfExpr, SomeExpr, EveryExpr, TypeSwitchExpr), when the next symbol is a word that looks like one of those keywords. (E.g. the next symbol "let" might begin a FLWRExpr "let $x := ..." or a PathExpr "let/lessor".) I think most of these conflicts can be resolved with two symbols of lookahead, but I've found one nasty case: if an Expr begins with if ( Expr ) then there is a conflict between IfExpr and PathExpr (FunctionCall) that can only be resolved by the symbol after the closing paren. Since the embedded Expr can be arbitrarily long, no fixed amount of lookahead will resolve the conflict. That is, the grammar isn't LL(k) for *any* k. __________ My advice: unless you design the language to be LL(1), don't try to shoehorn it into an LL(1) grammar. -Michael Dyck
Received on Thursday, 21 June 2001 20:48:20 UTC