Re: The rewrite rules matter from C. M. Sperberg-McQueen on 2022-08-22 (public-ixml@w3.org from August 2022)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Mon, 22 Aug 2022 08:35:00 -0600
To: Steven Pemberton <steven.pemberton@cwi.nl>
Cc: Norm Tovey-Walsh <norm@saxonica.com>, public-ixml@w3.org
Message-ID: <87v8qk9vrb.fsf@blackmesatech.com>

Steven Pemberton <steven.pemberton@cwi.nl> writes:

> Interesting.
> The rules were only ever meant as hints, and certainly not
> normative. I certainly use a different set, and we changed them at
> least once, I think in order to make them so that any rule only used
> itself or earlier rules.
>
> But if people are adopting them as given, then we should either point
> out that other rules might work better for them, or indeed rewrite
> them again.

For what it's worth, I think the only criterion we can usefully apply is
clarity.

If we could identify a set of rules that would leads to better parsing
performance than other possible rules, that might well be worth doing,
but before we embark on such a mission I think it might be wise to
satisfy ourselves that there is such a set of rules.  Does anyone
believe that there is?

Some time ago, I spent some time comparing the performance of several
different grammars for the same language including some BNF forms
generated by several different sets of rewrite rules from the initial
EBNF form.  Some were indeed faster than others.  When I rewrote the
routine that extracts a tree from the data generated by the recognizer,
however, my recollection is that the relative speeds changed.  That
suggests to me that there is no universally optimal rewrite from EBNF to
BNF.

Rereading the relevant part of the spec, I see that the prose is not as
explicit as I thought about the status of the rewrite rules given.
Perhaps it would help to add one paragraph at the end:

    Other sets of rewrite rules can also be used to produce equivalent
    grammars without repetitions or options.  In some implementations,
    the choice of rewrite rules can affect parsing speed;
    experimentation can be helpful.

Michael

> Steven
>
> On Sunday 21 August 2022 19:04:07 (+02:00), Norm Tovey-Walsh wrote:
>
>> Hello folks,
>>
>> A Sunday evening observation for my fellow implementors before I wander
>> off in search of a cocktail. (I found rye whiskey at Tesco, oh frabjous
>> joy!)
>>
>> In the context of studying a grammar provided by a friend, Bethan and I
>> were discussing some aspects of the BNF that results from the rewrite
>> rules suggested in the specification. She observes, correctly, that many
>> of them result in “tails” of epsilons (that’s my paraphrase of her
>> somewhat more precise analysis).
>>
>> Matching “a” against “S=f+.f='a'.”, for example, winds up matching “a”
>> followed by “f-star”, f-star matches “f-option”, and “f-option” matches
>> ε.
>>
>> She proposed a different set of rules for f*, f+, and f**sep:
>>
>> f* ⇒ f-star
>> -f-star = f+ | ().
>>
>> f+ ⇒ f-plus
>> -f-plus = f | (f, f-plus).
>>
>> f**sep ⇒ f-star-sep
>> -f-star-sep = f++sep | ().
>>
>> That appears to be an equally valid set of rewrites and, whatever other
>> effects it may have, it’s clear that it will avoid the “tails”.
>>
>> It does, and my Earley parser with these rewrite rules runs, informally,
>> about twice as fast.
>>
>> Any further analysis will be the work of another day…
>>
>> Be seeing you,
>> norm
>>
>> --
>> Norm Tovey-Walsh
>> Saxonica
>>

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Monday, 22 August 2022 14:52:00 UTC