ACTION (2022-09-13): John to report how many Earley items are reduced by the changed rewrite rules.

I had an action to report on the effects of the changed rewrite rules. I 
have some very rough preliminary findings.

My test case is the iXML test suite (as of perhaps a month ago ;-) which 
has 234 test sets, 726 test cases, resulting in 693 parsing tests and 
174 grammar tests.

It turns out that my implementation as originally written, uses the 
following repetition rewrites:

    f+ ⇒ f-plus
    -f-plus: f, f*.

which is in the original spec. and

    f* => f-star.
    -f-star: f, f-star|().

_which isn't._ The option rewrite and the separator repetition rewrites 
are as in the spec.

On my Windows laptop with Chrome, running in a SaxonJS environment, the 
base figure with the rewrites as described above. are that the tests 
take a total of *5 mins 30 seconds*, and create *3.4E6 Earley items* in 
the parsing process. (Note this does not include Earley parsing of the 
iXML grammars, as my system has a hand-written parser for the iXML 
sources.) In this process, 24 of the test cases fail, exclusivley those 
(sample.grammar.*) of exceptional ambiguity and usually empty string 
inputs, i.e. not situations anticipated in practical use.

When I switch on the alternative rewrite of*f+* to that proposed by 
Bethan Tovey-Walsh:

    f+ => f-plus
    -f-plus = f | (f, f-plus).

then there is a marked difference in performance. The same tests now 
take *1 minute 1 second* and create a total of *2.7E6 Earley items* - 
the timing figures are pretty consistent.However, four additional test 
cases now fail - all involving ixml compiling itself (one line, no 
spaces, spaces variants) and whichresult in two ambiguous solutions 
being generated, whereas before there was a single. I'll have to look a 
little more closely at this and investigate further possible rewrites 
for the separator cases....

John

-- 
*John Lumley* MA PhD CEng FIEE
john@saxonica.com

Received on Tuesday, 25 October 2022 14:51:25 UTC