- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Thu, 16 Dec 2021 11:07:44 +0000
- To: Dave Pawson <dave.pawson@gmail.com>
- Cc: public-ixml@w3.org
- Message-ID: <m2mtl07pzv.fsf@saxonica.com>
> I'm confused. To my thinking "a"*"#" would match any number of 'a' > characters followed by one hash character. > > To me, a#a a#a#a appears wrong. Is zero or more 'left associative' if > that's the right expression? How does the a# repetition match? > > Same applies to the zero or one example. That took me a while to get my head around too. I think it’s just part of the definition of repeat0 and repeat1. Note that these are different: a*"#" and a*,"#" The former can be distinguished as “factor * sep” and has the semantics that the spec describes for repeat0. The latter is “a*” followed by “#”. Later, “hints for implementors” observes that a*"#" can be rewritten. For example: X: a*"#" . can be rewritten as X: Y . -Y: a+"#" ; . which can be further rewritten as X: Y . -Y: Z ; . -Z: a, ("#", a)* And on we go: X: Y . -Y: Z ; . -Z: a, A . -A: "#", a, A ; . I find “the alternative that matches nothing” in the grammar quite hard to read. I almost wish we had a special symbol for it, like ∅. Then we’d have: X: Y . -Y: Z ; ∅ . -Z: a, A . -A: "#", a, A ; ∅ . Unless I’ve got something wrong along the way, of course. All of this has turned out to be an interesting challenge for me to implement because I don’t think the PEP parser can be made to accept “nothing” as an alternative. Instead, I think (I think!) I’ve modified it so that it understands an optional terminal or nonterminal. So I’ll end up rewriting it something like this: X: Y . -Y: Z? . -Z: a, A . -A: ("#", a, A)? . Where the optionality is addressed in the parser when candidate edges are selected. But now we’re into about four levels of manual rewriting so I make no claims of correctness! Be seeing you, norm -- Norm Tovey-Walsh Saxonica
Received on Thursday, 16 December 2021 11:23:34 UTC