The rewrite rules matter

Hello folks,

A Sunday evening observation for my fellow implementors before I wander
off in search of a cocktail. (I found rye whiskey at Tesco, oh frabjous
joy!)

In the context of studying a grammar provided by a friend, Bethan and I
were discussing some aspects of the BNF that results from the rewrite
rules suggested in the specification. She observes, correctly, that many
of them result in “tails” of epsilons (that’s my paraphrase of her
somewhat more precise analysis).

Matching “a” against “S=f+.f='a'.”, for example, winds up matching “a”
followed by “f-star”, f-star matches “f-option”, and “f-option” matches
ε.

She proposed a different set of rules for f*, f+, and f**sep:

f* ⇒ f-star
-f-star = f+ | ().

f+ ⇒ f-plus
-f-plus = f | (f, f-plus).

f**sep ⇒ f-star-sep
-f-star-sep = f++sep | ().

That appears to be an equally valid set of rewrites and, whatever other
effects it may have, it’s clear that it will avoid the “tails”.

It does, and my Earley parser with these rewrite rules runs, informally,
about twice as fast.

Any further analysis will be the work of another day…

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Sunday, 21 August 2022 17:13:00 UTC