a corpus of grammars from Kings College London from C. M. Sperberg-McQueen on 2022-03-15 (public-ixml@w3.org from March 2022)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Tue, 15 Mar 2022 07:58:46 -0600
To: public-ixml@w3.org
Message-ID: <878rtbco6h.fsf@blackmesatech.com>

Grammar fans and testing fans,

Some of you may be interested in a corpus of 20,000 sample grammars
created at Kings College London for an experiment in automatic detection
of ambiguity in context-free grammars.  I stumbled across it this
morning when looking idly around on the network trying to see if there
are any automated ambiguity detection tools we might be able to use on
ixml.ixml.

A paper by Vasudevan and Tratt, which I have not read in full¸ presents
a 'breadth-first' technique for seeking ambiguity in a grammar, which
contrasts in their account with the 'depth-first' search of other tools
[1].  They also describe the corpus of grammars they built for their
experiment using two different approaches to machine generation of new
grammars, and point to a repository where their code and their test
corpus can be downloaded [2].

[1] https://soft-dev.org/pubs/pdf/vasudevan_tratt__detecting_ambiguity_in_programming_language_grammars.pdf
[2] https://figshare.com/articles/dataset/cfg_amb_experiment/774614

Since many (or all?) of their grammars are machine-generated, their
corpus is likely to illuminate regions of the space of possible grammars
that would be unlikely to be exercised by a corpus purely of grammars
written by people.

Michael


-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Tuesday, 15 March 2022 13:59:13 UTC