[Bug 4176] [UPD] Syntax "do rename ... as ..." problematic with tokenization

http://www.w3.org/Bugs/Public/show_bug.cgi?id=4176





------- Comment #6 from martin@x-hive.com  2007-02-05 10:55 -------
The problem for me is that the grammar started out as being LL(1) if a certain
(quite complex) set of tokenization rules is being followed. With this addition
of "as" as a token that leads to a non itemtype state, these rules break - it's
no longer possible to write a tokenizer that is self-sufficient and correctly
tokenizes any input without the help of a parser. The language is not
ambiguous, it's just no longer LL(1), even with a tricky lexer.

To clarify: using 'as' as the keyword in this place would require to no longer
lex a following 'element()' as an element test/type. This would then require
the parser to look ahead beyond the QName token 'element' and see what's coming
up to decide if 'element' is a type name or the element test. That is not
LL(1).

It's possible to solve this (my current way is to include a Horrible Hack (tm)
where the parser tells the lexer in the rename state to expect that "as"
token), but it's quite ugly to add a language extension that breaks a formerly
good strategy to parse the language.

I'd also say that adding more and more syntax to the language doesn't really
make it more usable, quite the contrary. 'as' used to be an indicator of a type
declaration (functions, flwor, typeswitch), now it's being used for two
different things. I think it would also be easier for GUIs/editors if we would
try to keep single keywords having a single meaning as much as possible.

Received on Monday, 5 February 2007 10:56:03 UTC