- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Mon, 22 Aug 2022 08:35:00 -0600
- To: Steven Pemberton <steven.pemberton@cwi.nl>
- Cc: Norm Tovey-Walsh <norm@saxonica.com>, public-ixml@w3.org
Steven Pemberton <steven.pemberton@cwi.nl> writes: > Interesting. > The rules were only ever meant as hints, and certainly not > normative. I certainly use a different set, and we changed them at > least once, I think in order to make them so that any rule only used > itself or earlier rules. > > But if people are adopting them as given, then we should either point > out that other rules might work better for them, or indeed rewrite > them again. For what it's worth, I think the only criterion we can usefully apply is clarity. If we could identify a set of rules that would leads to better parsing performance than other possible rules, that might well be worth doing, but before we embark on such a mission I think it might be wise to satisfy ourselves that there is such a set of rules. Does anyone believe that there is? Some time ago, I spent some time comparing the performance of several different grammars for the same language including some BNF forms generated by several different sets of rewrite rules from the initial EBNF form. Some were indeed faster than others. When I rewrote the routine that extracts a tree from the data generated by the recognizer, however, my recollection is that the relative speeds changed. That suggests to me that there is no universally optimal rewrite from EBNF to BNF. Rereading the relevant part of the spec, I see that the prose is not as explicit as I thought about the status of the rewrite rules given. Perhaps it would help to add one paragraph at the end: Other sets of rewrite rules can also be used to produce equivalent grammars without repetitions or options. In some implementations, the choice of rewrite rules can affect parsing speed; experimentation can be helpful. Michael > Steven > > On Sunday 21 August 2022 19:04:07 (+02:00), Norm Tovey-Walsh wrote: > >> Hello folks, >> >> A Sunday evening observation for my fellow implementors before I wander >> off in search of a cocktail. (I found rye whiskey at Tesco, oh frabjous >> joy!) >> >> In the context of studying a grammar provided by a friend, Bethan and I >> were discussing some aspects of the BNF that results from the rewrite >> rules suggested in the specification. She observes, correctly, that many >> of them result in “tails” of epsilons (that’s my paraphrase of her >> somewhat more precise analysis). >> >> Matching “a” against “S=f+.f='a'.”, for example, winds up matching “a” >> followed by “f-star”, f-star matches “f-option”, and “f-option” matches >> ε. >> >> She proposed a different set of rules for f*, f+, and f**sep: >> >> f* ⇒ f-star >> -f-star = f+ | (). >> >> f+ ⇒ f-plus >> -f-plus = f | (f, f-plus). >> >> f**sep ⇒ f-star-sep >> -f-star-sep = f++sep | (). >> >> That appears to be an equally valid set of rewrites and, whatever other >> effects it may have, it’s clear that it will avoid the “tails”. >> >> It does, and my Earley parser with these rewrite rules runs, informally, >> about twice as fast. >> >> Any further analysis will be the work of another day… >> >> Be seeing you, >> norm >> >> -- >> Norm Tovey-Walsh >> Saxonica >> -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Monday, 22 August 2022 14:52:00 UTC