- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Thu, 16 Dec 2021 11:07:44 +0000
- To: Dave Pawson <dave.pawson@gmail.com>
- Cc: public-ixml@w3.org
- Message-ID: <m2mtl07pzv.fsf@saxonica.com>
> I'm confused. To my thinking "a"*"#" would match any number of 'a'
> characters followed by one hash character.
>
> To me, a#a a#a#a appears wrong. Is zero or more 'left associative' if
> that's the right expression? How does the a# repetition match?
>
> Same applies to the zero or one example.
That took me a while to get my head around too. I think it’s just part
of the definition of repeat0 and repeat1. Note that these are different:
a*"#"
and
a*,"#"
The former can be distinguished as “factor * sep” and has the semantics
that the spec describes for repeat0. The latter is “a*” followed by “#”.
Later, “hints for implementors” observes that
a*"#"
can be rewritten. For example:
X: a*"#" .
can be rewritten as
X: Y .
-Y: a+"#" ; .
which can be further rewritten as
X: Y .
-Y: Z ; .
-Z: a, ("#", a)*
And on we go:
X: Y .
-Y: Z ; .
-Z: a, A .
-A: "#", a, A ; .
I find “the alternative that matches nothing” in the grammar quite hard
to read. I almost wish we had a special symbol for it, like ∅. Then we’d
have:
X: Y .
-Y: Z ; ∅ .
-Z: a, A .
-A: "#", a, A ; ∅ .
Unless I’ve got something wrong along the way, of course.
All of this has turned out to be an interesting challenge for me to
implement because I don’t think the PEP parser can be made to accept
“nothing” as an alternative. Instead, I think (I think!) I’ve modified
it so that it understands an optional terminal or nonterminal. So I’ll
end up rewriting it something like this:
X: Y .
-Y: Z? .
-Z: a, A .
-A: ("#", a, A)? .
Where the optionality is addressed in the parser when candidate edges
are selected.
But now we’re into about four levels of manual rewriting so I make no
claims of correctness!
Be seeing you,
norm
--
Norm Tovey-Walsh
Saxonica
Received on Thursday, 16 December 2021 11:23:34 UTC