Re: a corpus of grammars from Kings College London

> Since many (or all?) of their grammars are machine-generated, their
> corpus is likely to illuminate regions of the space of possible grammars
> that would be unlikely to be exercised by a corpus purely of grammars
> written by people.

Cool. I wrote a quick ixml grammar to parse the boltzcfg grammars and a
quick XSLT[1] to turn them into ixml grammars. I think I can extend the
XSLT so that it will also generate a sentence in the grammar.

Then it should be possible to test against them. Some (many? all?) have
unreachable symbols so it will require a processor that’s willing to
operate in a non-conformant mode.

                                        Be seeing you,
                                          norm

[1] I decided it wasn’t possible to generate ixml directly because each
grammar begins with a declaration of the nonterminals are to be treated
as tokens. I just turn those into literals. (And I inject spaces into
the grammar so that the input is a space separated list of “words”.)

--
Norm Tovey-Walsh
Saxonica

Received on Thursday, 17 March 2022 09:57:18 UTC