For your amusement

Hello,

In order to parse ixml with my parser, I have to do a little bit of
rule-rewriting. I need to turn optionality and repetition into new
nonterminals so that all of the rules have the form:

  nonterminal: seqence, of, terminals, and, nonterminals.

I also allow repeated definitions for the same nonterminal to represent
alternatives.

For example:

  chapter: para+.

might become

  chapter: paraplus.
  -paraplus: para, parastar.
  -parastar: para, parastar.
  -parastar: para.

(I did that rewrite in my head, I make no claims that it’s either
optimal or even correct.)

This is of no interest to the user, of course, because it all happens
automatically. And it has no impact on the ixml output because the
generated nonterminals are all marked “-”, they produce nothing.

I say all of this only because, as a consequence of this rewriting, the
“raw” results that I get from parsing an ixml grammar look a little
weird.

Nevertheless, for your amusement, in all it’s ~10,000 node glory, here
is an SVG diagram of the shared packed parse forest (SPPF) of the text
of the ixml grammar (with most occurrences of insiginficant whitespace
removed) parsed with ixml:

  https://nwalsh.com/scratch/sppf/ixml.svg

That’s sort of goofy and I doubt it’s useful. It take about 3.5 seconds
for my implementation to produce the parse and it takes graphviz about
100 seconds to draw it. So I win. :-)

Here’s a simple grammar for arithmetic expressions (in a slightly
different BNF format, apologies):

Sum => Sum, ["+";"-"], Product
Sum => Product
Product => Product, ["*";"/"], Factor
Product => Factor
Factor => '(', Sum, ')'
Factor => Number
Number => ["0"-"9"], Number
Number => ["0"-"9"]

The SPPF representation of a parse of “1+(2*3-4)” with that grammar is a
little easier to read:

  https://nwalsh.com/scratch/sppf/sum.svg

Each node in the graph is labelled with either a symbol or the partially
completed Earley state that it represents. The numbers following the
label are the range of characters that it spans. The extra intermediate
nodes in the graph are added to limit its size.

In my diagrams, nonterminals are oval, Earley states are rectangles, and
terminals are houses.

A graph of an ambiguous parse has nodes with overlapping ranges. A graph
with loops is infinitely ambiguous. (At least, that’s my current
understanding of how to interpret the graph; I could be wrong.)

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Tuesday, 18 January 2022 08:42:08 UTC