- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Wed, 13 Jul 2022 08:02:29 -0600
- To: Norm Tovey-Walsh <norm@saxonica.com>
- Cc: public-ixml@w3.org
Norm Tovey-Walsh <norm@saxonica.com> writes: > For your amusement, here are the results I got running Michael’s recent > performance tests (the a-star tests) through my Earley and GLL > parsers. From the test names, I infer that you ran a test catalog that included just the grammar tests. > Cells marked “-” indicate that the test didn’t complete. The “% diff” > column is an attempt to indicate how much faster the GLL parser is. > Negative numbers indicate that Earley was faster by that percent. What I notice here is that all of the cases where GLL is faster involve grammars in the 'long comment' style, over a certain size. Earley is almost always faster for whitespace-padded grammars and for grammars in the 'short comment' style, which has mostly comments of length four ({..}), with the occasional three (at the end of a line, so one line of comments is 64 characters including the newline) or very occasional five or longer. Rearranging things by style (m vs s, lc vs sc vs ws); > |------------+------------+-------------+-----------| > | test | Earley | GLL | % diff | > |------------+------------+-------------+-----------| > | G.d3.m.lc | 1.87s | 0.59s | 316.95 | > | G.d4.m.lc | 17m6.83s | 1.28s | 80221.09 | > | G.b6.m.lc | 0.91s | 0.87s | 104.60 | > | G.b7.m.lc | 4.20s | 1.10s | 381.82 | > | G.b8.m.lc | 19.20s | 0.80s | 2400.00 | > | G.b9.m.lc | 3m15.25s | 1.00s | 19525.00 | > | G.b10.m.lc | 19m38.45s | 1.38s | 85394.93 | So GLL is faster when there are multiple comments of length 124 or more, Earley if they are 61 characters or shorter. > | G.d3.s.lc | 25.54s | 0.58s | 4403.45 | > | G.d4.s.lc | - | 1.21s | | > | G.b5.s.lc | 2.01s | 0.60s | 334.83 | > | G.b6.s.lc | 7.35s | 0.89s | 825.84 | > | G.b7.s.lc | 53.02s | 0.85s | 6237.65 | > | G.b8.s.lc | 7m27.34s | 0.76s | 58860.53 | > | G.b9.s.lc | 1h1m59.03s | 0.93s | 399895.70 | > | G.b10.s.lc | - | 1.34s | | GLL is faster when there is a single long comment of length 309 or more, Earley for single long comments of 149 or less. > | G.d4.s.sc | 15m57.37s | 12m24.77s | 128.55 | > | G.b8.s.sc | 21.49s | 20.56s | 104.52 | > | G.b9.s.sc | 2m29.46s | 2m11.77s | 113.42 | > | G.b10.s.sc | 19m55.89s | 14m20.23s | 139.02 | GLL is faster if there is long block of many short comments, with the long block totaling 2550 bytes or more (including newlines not inside the comments), Earley for many short comments with a total length of 1270 or lower. > | G.b8.s.ws | 15m10.14s | 12m37.35s | 120.17 | > | G.b9.s.ws | 2h20m8.87s | 1h48m20.07s | 129.37 | > |------------+------------+-------------+-----------| GLL is faster if the grammar is padded with a single large block of whitespace of length 2550 or more; Earley is faster for shorter big blocks of whitespace. It looks to me as if there is something non-linear in the recognition of long comments by Coffeepot's Earley parser. If I understand correctly, on deterministic grammars Earley should in principle be linear in cost (to the size of the grammar plus the size of the input) -- it makes me wonder whether there is something non-deterministic about the way the ixml grammar defines comments. I'd post the run times for Aparecium, embarrassing though they would be by comparison, but after the run finally finished I discovered that my test harness has developed a fault where the code for keeping track of parse time ought to be, so all the parse times show up as 0. I'll just say I am confident that Aparecium has much worse problems with non-linearity than Coffeepot. -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Wednesday, 13 July 2022 14:52:11 UTC