RE: Xquery grammar question from scott_boag@us.ibm.com on 2003-09-06 (public-qt-comments@w3.org from September 2003)

From: <scott_boag@us.ibm.com>
Date: Sat, 6 Sep 2003 12:25:02 -0400
To: "Kevin Jones" <kjones@actuate.com>
Cc: public-qt-comments@w3.org, public-qt-comments-request@w3.org
Message-ID: <OF3F7FD4CD.6731D4DD-ON85256D99.0052E051-85256D99.005A2938@lotus.com>
Hi Kevin.  Thank you very much for the comments.  Replies inline below.

public-qt-comments-request@w3.org wrote on 09/05/2003 05:38:02 PM:

> 
> Hi Scott,
> 
> Here is another round of issues that I found with regard to the 22 
> August 2003 draft.
> 
> 1) Seperator should be spelled Separator

The only place I found this is with the (now fixed) QuerySeparator, which 
actually shouldn't be published (read below).  Common (embarrassing) 
spelling mistake of mine.

> 
> 2) The QuerySeperator token (DEFAULT and OPERATOR states) is not 
> defined in the spec

Bug.  We only use the QuerySeparator internally for testing purposes... it 
is not part of the language, so it should not occur in these lists.

> 
> 3) QName "(" - Why is this token group needed? Isn't the 
> disambiguation of QName "(" from QName "(:" handled by the longest 
> match rule when the '(' is encountered?

Not useing QName "(" as a single long token in the test parser causes a 
choice conflict that would need to be solved by LL(2).  Consider "foo" and 
"foo ()".  If both 'foo' words are QNames, the parser can not decide which 
branch to take.  But note that the grammar itself is saying is that there 
is a choice issue here, and the implementation needs to solve it... it 
doesn't say how it needs to solve it.  I only picked one solution for the 
test parser.

I'm not sure what you are saying about QName "(:".  The grammar itself 
should treat these as two tokens.  As I said, there is a bug in the 
current implementation... it needs to sniff ahead one character (at lex 
time) and reject QName "(" in this case.  This is the only place this 
occurs in the grammar, so it's a drag, but I don't want to revisit the 
comment syntax again.  In any case, I'm not sure how the longest token 
rule would help, except maybe to make a token for QName "(:" to catch this 
case (but then I'm not sure what you do with it from there...).

> 
> 4) State changes are mixed in with token groups. How is this 
> reconciled? Aren't token groups expected to be processed in the same 
state?

I don't think I understand your comment here.  A group of tokens in the 
state transition table is treated as a single unit.  (Though, again, this 
is only a way of documenting unambiguous behavior... an implementation can 
do what it wants.)

> 
> 5) The following is more thinking out loud than an issue.
> 
> Q: Why are the following designated "named terminals" when they 
> might be more easily represented as grammar productions?
> SchemaMode ::= "lax" | "strict" | "skip"
> SchemaGlobalTypeName ::= "type" "(" QName ")"
> SchemaGlobalContext ::= QName | SchemaGlobalTypeName 
> SchemaContextStep ::= QName PITarget ::= NCName VarName ::= QName
> A: Because they initiate state changes.
> 
> Possible solution: Leave as is or make them productions and add 
> state change in the grammar
> 
> Can you think of any other way that this could be handled to make 
> the distinction between lexical analysis and parsing more clear.

I'm assuming you've read http://www.w3.org/TR/xquery/#parse-note-validate. 
This whole area of the validate expression is thorny -- and I don't really 
have any bright ideas beyond that I've done... I think it's good enough, 
and other solutions turn into nightmares in their own right. The 
definition of SchemaMode as a "named terminal", for instance, is because 
of <"validate" SchemaMode> in the appendex version of the BNF, and 
everything inside a grouping has to be a lexical-only construct.

> 
> 6) Can the explicit whitespace designation be removed from the 
> grammar and placed at the lexical level and/or handled by lexical 
states?

Well, I think the notation at the grammar production level is a good 
thing.  And it is specified more formally at the lexical level where the 
states are listed for non-explicit whitespace in 
http://www.w3.org/TR/xquery/#whitespace-rules.  So I don't think there's 
anything big that needs to be done there... though when I look at where 
"S" falls in the lex tables, I think it is incorrect, and I'll have to 
clean that up.

> 
> regards,
> 
> Kevin Jones
> 

Thanks again for the comments!

-scott
Received on Saturday, 6 September 2003 12:26:10 UTC