XQuery grammar: issues opened during implementation of a parser

Gentelmeen,


Looking at the list of issues -- 102 items long -- I feel I may ask you to make the list 6%
longer...



1. Is it intentional that there's no NOT operator in the grammar?



2. We've now some Bash- or DIESEL-like "embedded {expressions} in strings".
As any "syntax sugar", it is hard to implement, but it's very convenient so we should have it.
The question: why this feature is allowed solely in constructions of attributes?
Why these strings are not allowed in any place where StringLiteral may appear?



3. Many languages have special quotes for names.
Samples are |pipes| in some Lisps and "double-quoted"."names"."of"."objects" in SQL.
The question: why should we use context to find whether a name is a keyword or just a name? 
I'd rather prefer to have single quotes for strings and double quotes for names, or vice versa.



4. It may be useful to replace

[37] Axis ::= (NCName "::") | "@"

with

[37] Axis ::= AxisName | "@" ,

where AxisName is, on developer's choice, either

[31.1] AxisName ::= (NCName S_1 "::")

or

[31.1] AxisName ::= "ancestor"         S_1 "::"
                  | "ancestor-or-self" S_1 "::"
                  | "attribute"        S_1 "::"
                  | "child"            S_1 "::"
                  . . .
                  | "self"             S_1 "::" ,

where

[31.2] S_1 ::=  (#x20 | #x9 )+	,
i.e. it's like S from [XML 1.0] but with no newlines inside, so both axis name and :: will be on
the same line.

It will not affect 99.99% of existing code but it will reduce the number of keywords to be
distinguished from plain names.
Maybe, S_1 is not needed at all, because I've never seen any space between axis name and colons in
real expressions.



5. We have

[43] PITest ::= "processing-instruction" "(" StringLiteral? ")" .

It may be better to have

[43] PITest ::= "processing-instruction" "(" Expr? ")" .



6. What's with DMS-s (data manipulation statements)? Will they ever appear in the language? Will
they top-level or they will be treated as plain expressions?



My parser for current syntax consists of
990 lines of Bison source and
400 ilnes of Flex source.

I found that it's hard to create fast scanner, because current grammar is character-based, not
lexem-based, so it's easy to write slow parser as one YACC file but the use of Flex is tricky.

a) 5 Flex states are needed -- "plain text", "single-quoted string", "double-quoted string",
"outermost level of expression embedded in single-quoted string", and
"outermost level of expression embedded in double-quoted string".

b) It takes trick known as "custom stack of Flex states". In other words,
some augmented transition network, not a standard transition graph, should be used to switch
between Flex states. This trick looks fine if based on undocumented feature of Flex's output,
but "ancient Lex" implementation is ugly.

Best Regards,
IvAn Mikhailov.


__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/

Received on Monday, 25 June 2001 02:04:43 UTC