- From: Michael Dyck <MichaelDyck@home.com>
- Date: Sun, 17 Jun 2001 12:55:38 -0700
- To: www-xml-query-comments@w3.org
XQuery 1.0: An XML Query Language W3C Working Draft 07 June 2001 Lexical analysis of xqueries seems fraught with problems now. Basically, the "lexical grammar" is ambiguous. (1) Keywords are a subset of NCName, which is a subset of QName. For example, consider these three QueryModules: (a) for $x in //x return $x (b) namespace for = "http://www.example.com/whatever" (c) //for In each case, the three letters "for" constitute a token, but in (a) it's a keyword, in (b) it's an NCName, and in (c) it's a QName. So a would-be tokenizer doesn't know what type of token it's got. (2) StringLiteral and AttributeValue generate (pretty much) the same set of strings. For instance, consider these occurrences of "foo": (a) / = "foo" (b) <e a="foo" /> In (a) it's a StringLiteral; in (b) it's an AttributeValue. But things are even worse, because StringLiteral is a terminal, whereas AttributeValue is a non-terminal. So in (a), the 5 characters "foo" consitute a token, but in (b) they constitute an AttributeValue containing 3 AttributeValueContents, each of which is a Char. For a worse example of this, consider: (c) / = "{ foo }" (d) <e a="{ foo }" /> In (c) it's a StringLiteral denoting a 7-character string. In (d) it's an AttributeValue containing a single AttributeValueContent, which is an EnclosedExpr, which contains (ultimately) the QName 'foo'. (Note that the two space characters are discardable whitespace in (d), but not in (c).) What is a would-be tokenizer to do? It seems that lexical analysis of XQuery requires contextual feedback from the parser, which must be running in parallel. This is an unwelcome complication, and one that is not supported by all parsing software. -Michael Dyck
Received on Sunday, 17 June 2001 16:02:28 UTC