- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 14 Jul 2022 11:45:56 -0600
- To: Norm Tovey-Walsh <norm@saxonica.com>
- Cc: public-ixml@w3.org
"C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com> writes: > Norm Tovey-Walsh <norm@saxonica.com> writes: > >> For your amusement, here are the results I got running Michael’s recent >> performance tests (the a-star tests) through my Earley and GLL >> parsers. > ... > I'd post the run times for Aparecium, embarrassing though they would be > by comparison, but after the run finally finished I discovered that my > test harness has developed a fault where the code for keeping track of > parse time ought to be, so all the parse times show up as 0. I'll just > say I am confident that Aparecium has much worse problems with > non-linearity than Coffeepot. Looking more closely today, I find that I do have parse times for the grammars in the doublings test catalog. The timings in the summary below are those reported by the BaseX prof:trace() function, which correlates pretty well with clock time as I am able to observe it. BaseX was not re-started between times, so there may be interference from background garbage collection or other extraneous events. As I said earlier, the results are kind of embarrassing compared to Coffeepot's run times, but they do have the virtue of showing clearly that once the input size rises past a thousand bytes or so, the processing time is dominated by a quadratic term (double the size of the input, quadruple the parse time), even when parsing against the ixml spec grammar which is mostly deterministic. That quadratic term is clearly an implementation artefact not essential to Earley parsing, so I can hope to find it an eliminate it. Michael <timingdata xmlns:t="http://blackmesatech.com/2022/iXML/test-harness"> <p>Gathered from a-start doubling test catalog 2022-07-12f.</p> <p>Times in milliseconds as reported by BaseX prof:track() on the call to ap:parse-grammar-from-string().</p> <p>For each grammar in each series, the ratio of its time to the preceding grammar in the series is given; as may be seen, as the size of the input grammar increases the times converge to something like 4.</p> <p>It looks as if a quadratic term comes to dominate the others as input rises about a thousand characters or so.</p> <group> <p>m.lc: single comments added in multiple places to increase size of input grammar.</p> <parsetime name="G.b1.m.lc" t:parsetime="392.31"/> <parsetime name="G.b2.m.lc" t:parsetime="252.03" ratio="0.6424256327903953"/> <parsetime name="G.b3.m.lc" t:parsetime="277.75" ratio="1.1020513430940762"/> <parsetime name="G.b4.m.lc" t:parsetime="361.59" ratio="1.3018541854185417"/> <parsetime name="G.b5.m.lc" t:parsetime="637.58" ratio="1.763267789485329"/> <parsetime name="G.b6.m.lc" t:parsetime="1425.83" ratio="2.236315442767966"/> <parsetime name="G.b7.m.lc" t:parsetime="5449.87" ratio="3.8222438860172674"/> <parsetime name="G.b8.m.lc" t:parsetime="20108.7" ratio="3.689757737340524"/> <parsetime name="G.b9.m.lc" t:parsetime="95510.92" ratio="4.749731210868927"/> <parsetime name="G.b10.m.lc" t:parsetime="444871.65" ratio="4.657809285053479"/> </group> <group> <p>m.sc: many small comments added in multiple places to increase size of input grammar.</p> <parsetime name="G.b2.m.sc" t:parsetime="306.52"/> <parsetime name="G.b3.m.sc" t:parsetime="450.47" ratio="1.4696267780242727"/> <parsetime name="G.b4.m.sc" t:parsetime="772.56" ratio="1.7150087686194417"/> <parsetime name="G.b5.m.sc" t:parsetime="1687.18" ratio="2.183882158020089"/> <parsetime name="G.b6.m.sc" t:parsetime="4361.73" ratio="2.585219123033701"/> <parsetime name="G.b7.m.sc" t:parsetime="14064.07" ratio="3.224424712212815"/> <parsetime name="G.b8.m.sc" t:parsetime="57041.67" ratio="4.055843720914359"/> <parsetime name="G.b9.m.sc" t:parsetime="274108.74" ratio="4.805412253883871"/> <parsetime name="G.b10.m.sc" t:parsetime="1.22544146E6" ratio="4.470639863581146"/> </group> <group> <p>m.ws: whitespace added in multiple places to increase size of input grammar.</p> <parsetime name="G.b2.m.ws" t:parsetime="510.8"/> <parsetime name="G.b3.m.ws" t:parsetime="920.77" ratio="1.8026037588097101"/> <parsetime name="G.b4.m.ws" t:parsetime="1940.65" ratio="2.107638172399188"/> <parsetime name="G.b5.m.ws" t:parsetime="4509.25" ratio="2.3235771519851594"/> <parsetime name="G.b6.m.ws" t:parsetime="11600.99" ratio="2.572709430614847"/> <parsetime name="G.b7.m.ws" t:parsetime="36958.23" ratio="3.1857824202934406"/> <parsetime name="G.b8.m.ws" t:parsetime="149276.69" ratio="4.0390649119289534"/> <parsetime name="G.b9.m.ws" t:parsetime="636291.59" ratio="4.262497982772795"/> <parsetime name="G.b10.m.ws" t:parsetime="2.606890.16" ratio="4.097005525406992"/> </group> <group> <p>s.lc: size of input grammar increased by adding single comment at top.</p> <parsetime name="G.b1.s.lc" t:parsetime="187.1"/> <parsetime name="G.b2.s.lc" t:parsetime="159.01" ratio="0.8498663816141101"/> <parsetime name="G.b3.s.lc" t:parsetime="181.09" ratio="1.1388591912458337"/> <parsetime name="G.b4.s.lc" t:parsetime="279.83" ratio="1.5452537412336407"/> <parsetime name="G.b5.s.lc" t:parsetime="521.66" ratio="1.864203266268806"/> <parsetime name="G.b6.s.lc" t:parsetime="1403.04" ratio="2.6895679178008667"/> <parsetime name="G.b7.s.lc" t:parsetime="5070.27" ratio="3.613774375641465"/> <parsetime name="G.b8.s.lc" t:parsetime="17018.39" ratio="3.3565056693233295"/> <parsetime name="G.b9.s.lc" t:parsetime="94241.78" ratio="5.537643690149303"/> <parsetime name="G.b10.s.lc" t:parsetime="477926.48" ratio="5.071280275054227"/> </group> <group> <p>s.lc: size of input grammar increased by adding many small comments at top.</p> <parsetime name="G.b1.s.sc" t:parsetime="150.53"/> <parsetime name="G.b2.s.sc" t:parsetime="176.12" ratio="1.1699993356805953"/> <parsetime name="G.b3.s.sc" t:parsetime="274.8" ratio="1.5602997955939133"/> <parsetime name="G.b4.s.sc" t:parsetime="524.39" ratio="1.9082605531295487"/> <parsetime name="G.b5.s.sc" t:parsetime="1120.08" ratio="2.1359675051011653"/> <parsetime name="G.b6.s.sc" t:parsetime="3286.44" ratio="2.934111849153632"/> <parsetime name="G.b7.s.sc" t:parsetime="11880.28" ratio="3.6149389613076766"/> <parsetime name="G.b8.s.sc" t:parsetime="50331.85" ratio="4.236587858198628"/> <parsetime name="G.b9.s.sc" t:parsetime="251679.97" ratio="5.00041166776107"/> <parsetime name="G.b10.s.sc" t:parsetime="1141209.74" ratio="4.534368547485125"/> </group> <group> <p>s.ws: size of input grammar increased by adding blanks and newlines at top.</p> <parsetime name="G.b1.s.ws" t:parsetime="181.37"/> <parsetime name="G.b2.s.ws" t:parsetime="248.79" ratio="1.3717263053426696"/> <parsetime name="G.b3.s.ws" t:parsetime="448.54" ratio="1.802885968085534"/> <parsetime name="G.b4.s.ws" t:parsetime="931.1" ratio="2.0758460783876576"/> <parsetime name="G.b5.s.ws" t:parsetime="3999.19" ratio="4.295124046826334"/> <parsetime name="G.b6.s.ws" t:parsetime="6573.64" ratio="1.6437428579287305"/> <parsetime name="G.b7.s.ws" t:parsetime="25347.94" ratio="3.855997590376108"/> <parsetime name="G.b8.s.ws" t:parsetime="116140.15" ratio="4.581837814039326"/> <parsetime name="G.b9.s.ws" t:parsetime="526724.22" ratio="4.535246596461258"/> <parsetime name="G.b10.s.ws" t:parsetime="2310351.09" ratio="4.386263251763893"/> </group> </timingdata> -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Thursday, 14 July 2022 18:03:53 UTC