XQuery grammar issues

Dear WG,

Some selected comments on the grammar in the Dec 2001 WD:

- The examples in 2.8.3 and the recognition of XmlCommentStart and
ProcessingInstructionStart in the default lexer state suggests that you
want to allow comments and PIs constructors in the default (XQuery)
state.  This is a useful feature (particularly in computed element
constructors).  But the grammar does not allow it!  I suggest adding
XmlComment and XmlProcessingInstruction and also CdataSection to the
Constructor production.

- A precedence table is not sufficient for expressions more complicated
than simple binary operators.  It is much better to write out the
productions explicitly, like

OrExpr ::= AndExpr ("or" AndExpr)*

IfExpr ::= "if" "(" Expr ")" "then" Expr "else" ControlExpr

ControlExpr ::= FLWRExpr | QuantifiedExpr | TypeswitchExpr | IfExpr

It is currently quite unclear what is e.g. allowed as the condition in
an IfExpr or WhereClause.  Taking the precedence table literally, I
cannot use an OrExpr in a WhereClause or even as the condition of an
IfExpr without adding an extra pair of parentheses, making the OrExpr a
PrimaryExpr.  So this is currently not allowed:

if (foo or bar) then expr1 else expr2

All this would become much clearer if the precedence was written
explicitly into the grammar.  And by the way, the parentheses in an
IfExpr are not necessary with reserved keywords.

- The ElementContent production allows computed element/attribute
constructors.  Surely this is a mistake.  It would introduce keywords in
element content.

- You no longer explicitly mention whitespace in the grammar for XML
constructs, particulary in ElementConstructor.  This makes the
whitespace handling quite unclear.  Is it freely allowed in start and
end tags?  If not, <foo bar="value"> will no longer parse.  If it is,
this would allow the end tag </ foo>, which is not allowed in XML.  Is
this intentional?

Also, I presume whitespace is allowed within tokens, such as in "cast"
"as" (obvious) "child" "::" (because it is allowed in XPath 1.0), but
not around the ':' in a QName, or between the "&" and the "amp;".  In
other words, this is completely inconsistent.

- A.3 3rd bullet mentions whitespace after '/' and '//'.  It is
completely unclear how to use this remark in the given lexical structure
and grammar.  Unlike with '<', it does not create another token.  Also
"// div foo" with "div" as an operator is meaningless and does not
parse.

- The Ref and Colon tokens seem not to be used.  The SemiColon token
does not even have a production.

- In element content <?foo foo?> is lexed as ProcessingInstructionStart
PITarget Char PITarget ProcessingInstructionEnd.  This is not allowed by
the grammar.

- A TagQName also allows an initial ':'.  I see no reason to allow this.
Why not restrict it to
NCName (":" NCName)?

- I suggest specifying that end of line translation is done as in XML
(it is now unclear).  I also suggest using the same translation in
string literals.

- The Char production should not specify [#x0020-#xFFFD], but
[#x0020-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] as in XML.  (The
specification should be in Unicode code points, not UTF-16.)

- It is impossible to distinguish Multiply and Star.  They should be 1
token.

- A.3.1 states "An operator that immediately follows a "/" or "//" when
used as a root symbol, should not parse"  First, "//" cannot be used as
a root symbol.  Second, this restriction is useless and unnecessarily
restrictive, except for the * (multiply) operator.  Why disallow e.g. "/
== ."?

- The transition table mentions an XQUERY_COMMENT state that is never
entered.

- Section 2.3.5 item 4 says that "." is short for "self::node()".  This
contradicts 2.1.1.2, which says that "." is the context item.  (If the
context item is not a node, they are not the same.)

- Section 2.8.1 last sentence says "Two adjacent curly braces in an
XQuery character string are interpreted as a single curly brace
character."  This suggest that it also holds in a string literal, but I
presume this is not the case.

- There really should be a way to put special characters in a string
literal, for instance using the same convention as XML.

Regards,
Bas de Bakker
X-Hive Corporation

Received on Friday, 21 December 2001 06:15:02 UTC