W3C home > Mailing lists > Public > public-ixml@w3.org > August 2021

Rewriting an ixml grammar as conventional BNF

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Mon, 9 Aug 2021 11:57:06 -0600
Message-Id: <E1AA25D7-4F11-4BB8-B250-1F772A6C9932@blackmesatech.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
To: public-ixml@w3.org
In developing my ixml parser, I have so far preferred to extend the 
Earley algorithm to cover the extended-BNF notation rather than
to rewrite the ixml grammar to follow the rules of BNF as Earley
assumed them.  I am now wondering, however, whether I should
re-think that approach.

If I may, I’d like to check my understanding of that approach.

I think that any EBNF rule can be rewritten to conventional BNF 
roughly as follows.  Is the following correct, as far as it goes?

For each expression E which is a repeat1, repeat0, or option, 
(1) create a new nonterminal N not already present in the grammar, 
(2) replace the expression with a reference to N, and (3) add an 
EBNF rule for N to the grammar, as follows:

  * If the expression E is an option, then E = F?.  Write

    -N:  ;  F.

  * If the expression E is a repeat0, then E = F*.  Write

    -N:  ;  F, N.

  * If the expression E is a repeat1, then E = F+.  Write

    -N: F, F*.

The new rules thus created need to be reduced from EBNF to BNF
in their turn.

All new rules are marked - (do not serialize as element) and
become invisible, and the serialization is as specified in the original 
ixml grammar.

Since I enjoyed figuring out how to extend Earley’s algorithm to
handle extended BNF, I don’t really want to do this.  But when the
input grammar is infinitely ambiguous, it appears that constructing a 
parse-tree grammar for the results is easier if the working grammar
is a BNF rather than an EBNF.   


C. M. Sperberg-McQueen
Communications Manager, Luis Torres for JMEC District 5
Received on Monday, 9 August 2021 17:57:28 UTC

This archive was generated by hypermail 2.4.0 : Monday, 9 August 2021 17:57:30 UTC