- From: John Lumley <john@saxonica.com>
- Date: Thu, 30 Nov 2023 14:45:10 +0000
- To: Norm Tovey-Walsh <norm@saxonica.com>, Gunther Rademacher <grd@gmx.net>
- Cc: Steven Pemberton <steven.pemberton@cwi.nl>, "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, public-ixml@w3.org
On 30/11/2023 13:59, Norm Tovey-Walsh wrote: > Gunther Rademacher <grd@gmx.net> writes: >> However a wider context cannot be LL(1), when a name can be immediately followed >> by a period. At this point already this requires lookahead of at least 2 to >> complete the name, plus more, if that period again could be followed by name >> characters. But I was not in the original discussion, so I am not sure what >> problem we are actually trying to solve by disallowing the final >> period. > Michael pressed on exactly this point. Before we do this, we do have to > be sure that the change resolves the lookahead problem that was > reported. > > (I’m a little hazy on exactly what the problem is because I use either > an Earley or GLL parser to parse the input grammar, so don’t really care > whether or not names end with a final “.”.) > > I imagine it was something like, “if you see a name that ends in ‘.’ > followed by a space, you can’t know if that’s the ‘.’ that ends the rule > until you look ahead an unbounded distance.” I guess that forbidding ‘.’ > at the end of the name would resolve that, but would it introduce > different unbounded lookahead problems? > That is exactly the issue I had with my hand-written parser - you need to keep looking forward across even for example comments to find out whether this period is the last in a rule. By forbidding the trailing, I can parse a name (including any periods) by a regular expression and if it ends in a period then drop it from the name and back up just one character in the input stream for further parsing. Makes it a lot easier and potentially faster. John Lumley
Received on Thursday, 30 November 2023 14:45:22 UTC