- From: XQuery <xquery@attbi.com>
- Date: Sun, 29 Dec 2002 16:52:35 -0800
- To: <public-qt-comments@w3.org>
1. ExprComment disallows } for no good reason. (By analogy, CdataSection and XmlComment do not prohibit >.) Recommended change: [6] ExprComment ::= "{--" Char* "--}" Note that ExprComment, CdataSection, XmlComment, and XmlProcessingInstruction could be more properly defined using: [6] ExprComment ::= "{--" ([^-]* | "-" [^-]* | "--" [^}]*) "--}" [99] CdataSection ::= "<![CDATA[" ([^]]* | "]" [^]]* | "]]" [^>]*) "]]>" [100] XmlPI ::= "<?" PITarget ([^?]* | "?" [^>]*) "?>" [101] XmlComment ::= "<!--" ([^-]* | "-" [^-]* | "--" [^}]*) "-->" This change would remove the need to specify the exception cases in prose outside the grammar ("Char*, except for the terminator"), and also would make the lookahead explicit. 2. VarName should include "$". Every occurrence of "$" precedes VarName; conversely every occurrence of VarName follows "$". Therefore, "$" should be rolled into VarName. Recommended change: [13] VarName ::= "$" QName (and update all referring productions) 3. ElementContent is ambiguous. In ElementContent, whitespace is significant. How then should the following expression parse? <x>{--x--}</x> I see three possible ways to parse this XQuery: (i) An element containing the seven characters {, -, -, x, -, -, and } (ii) An empty element (parsing the content as an XQuery comment which is ignored) (iii) An element whose content is the value of the enclosed expression --x-- (which is semantically equivalent to -(-(x--)), that is, unary minus applied twice to the path x--) Which is the correct one? 4. Another ambiguity in ElementContent. The Char must disallow {, }, <, and &. In other words, the production should be changed to: [102] ElementContent ::= [^{}<&] | "{{" | "}}" | ElementConstructor | EnclosedExpr | CharRef | PredefinedEntityRef | XmlComment | XmlProcessingInstruction (see also next feedback item) 5. AttributeValueContent is ambiguous. Clearly the Char must disallow the terminator, so at a minimum these productions must be changed to: [104] AttributeValue ::= ('"' (EscapeQuot | [^"] | AttributeValueContent)* '"') | ("'" (EscapeApos | [^'] | AttributeValueContent)* "'") [105] AttributeValueContent ::= CharRef | "{{" | "}}" | EnclosedExpr | PredefinedEntityRef However, there are still ambiguities with & and { (see next item). 6. Escapes, strings, and content could be simplified. According to the explanation of the *_ATTRIBUTE_CONTENT lexical states, the EnclosedExpr nests lexical states and therefore may contain the terminator. For example, the XQuery expression <x a="{""}"/> is syntactically valid and semantically equivalent to <x a=""/>. On the one hand, you could potentially simplify the lexer by eliminating this nesting condition (i.e., any occurrence of the terminator, even in an EnclosedExpr, ends the attribute value). On the other hand, as long as this nesting condition remains, you have no less than four different ways to escape the terminator: (a) use a predefinedEntityRef: <x a="""/> (b) use a numerical ref (hex or decimal): <x a="""/> (c) use an enclosed expr: <x a="{'"'}"/> (d) double it up: <x a=""""/> There is no good reason to introduce a fourth mechanism (doubling-up) when three others already exist. I recommend removing it, and then simplifying the grammar as follows: Delete: 11, 18, 105 (EscapeQuot, EscapeApos, AttributeValueContent) Add: [X1] ComputedString ::= ('"' (QuotStringContent | EnclosedExpr)* '"') | ("'" (AposStringContent | EnclosedExpr)* "'") [X2] QuotStringContent ::= [^"&{}] | EscapeSequence [X3] AposStringContent ::= [^'&{}] | EscapeSequence [X4] EscapeSequence ::= CharRef | PredfinedEntityRef | "{{" | "}}" Change: [4] StringLiteral ::= '"' QuotStringContent* '"' | "'" AposStringContent* "'" [5] URLLiteral ::= StringLiteral [104] AttributeValue ::= ComputedString [102] ElementContent ::= Char | EscapeSequence | ElementConstructor | CdataSection | XmlComment | XmlProcessingInstruction (or if you adopt item #4 above, this would instead be: [102] ElementContent ::= [^{}<&] | EscapeSequence | ElementConstructor | CdataSection | XmlComment | XmlProcessingInstruction ) Personally, I would then enhance the language by allowing ComputedString in PrimaryExpr (and possibly also ProcessingInstructionTest) : [62] PrimaryExpr ::= Literal | ComputedString | FunctionCall | ("$" VarName) | ParenthesizedExpr These changes have the added advantage that now the grammar allows charref and predefinedentityref in StringContent, and eliminates the Char ambiguity that occurred in AttributeValue. Note also that the "}}" escape is technically unnecessary, and both it and the "{{" escape are redundant with the existing CharRef escapes. Consider removing these redundant escapes. Cheers, Michael Brundage xquery@attbi.com Author, "XQuery: The XML Query Language", Addison-Wesley
Received on Sunday, 29 December 2002 19:53:26 UTC