Re: repetition

> I'm confused. To my thinking "a"*"#"  would match any number of 'a'
> characters followed by one hash character.
>
> To me, a#a a#a#a appears wrong. Is zero or more 'left associative' if
> that's the right expression? How does the a# repetition match?
>
> Same applies to the zero or one example.

That took me a while to get my head around too. I think it’s just part
of the definition of repeat0 and repeat1. Note that these are different:

  a*"#"

and

  a*,"#"

The former can be distinguished as “factor * sep” and has the semantics
that the spec describes for repeat0. The latter is “a*” followed by “#”.

Later, “hints for implementors” observes that

  a*"#"

can be rewritten. For example:

X: a*"#" .

can be rewritten as

X: Y .
-Y: a+"#" ; .

which can be further rewritten as

X: Y .
-Y: Z ; .
-Z: a, ("#", a)*

And on we go:

X: Y .
-Y: Z ; .
-Z: a, A .
-A: "#", a, A ; .

I find “the alternative that matches nothing” in the grammar quite hard
to read. I almost wish we had a special symbol for it, like ∅. Then we’d
have:

X: Y .
-Y: Z ; ∅ .
-Z: a, A .
-A: "#", a, A ; ∅ .

Unless I’ve got something wrong along the way, of course.

All of this has turned out to be an interesting challenge for me to
implement because I don’t think the PEP parser can be made to accept
“nothing” as an alternative. Instead, I think (I think!) I’ve modified
it so that it understands an optional terminal or nonterminal. So I’ll
end up rewriting it something like this:

X: Y .
-Y: Z? .
-Z: a, A .
-A: ("#", a, A)? .

Where the optionality is addressed in the parser when candidate edges
are selected.

But now we’re into about four levels of manual rewriting so I make no
claims of correctness!

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Thursday, 16 December 2021 11:23:34 UTC