Re: Comments on Converting EBNF to BNF

Steven's comments are welcome; they elicit a few comments in return.

Steven Pemberton <steven.pemberton@cwi.nl> writes:

> https://github.com/invisibleXML/ixml/blob/master/misc/ebnf-to-bnf.md
>
> General comments:
>
> I appreciate the approach of this article, though it does reflect a
> different idea to mine of what is behind rewriting rules.
>
> Parsing algorithms don't need to deal with EBNF extensions, because
> all the extensions are only syntactic differences intended to make
> life easier for the author and reader, and all grammars written in
> EBNF can be rewritten, as this article shows, to equivalent BNF forms.

This all seems true enough; I'm not quite sure whether it reflects a
different idea about what is behind rewriting rules or not.

What would it mean to say that parsing algorithms "need to" deal with
EBNF?  Published algorithms assume grammars in a particular form,
typically BNF or something isomorphic to it.  Nothing we can say or do
will make the editors of those journals return the manuacripts to the
authors asking that they revise the article to account for a different
input format for grammars.

For a potential implementer of ixml, on the other hand, it seems
passably clear (a) what it means to "deal with" EBNF, and (b) that they
"need" to do so.  Rewriting the input grammar into BNF and using a
published algorithm for parsing BNF is one way they can perform their
task.  Perhaps I am unnecessarily pedantic on this point, but rewriting
to BNF is not the only possible approach to implementing ixml.

> Accordingly, I consider the rewriting as equivalent to code
> generation, and is indeed how I treat it in my implementation.

I'm puzzled here in part because in past discussions of ambiguity you
have denied that you rewrite the grammar, or that the way in which you
parse is equivalent to parsing against a particular BNF equivalent to
the input grammar.

> So
> parse algorithms don't need to deal with EBNF constructs for the same
> reason that machine opcodes don't have to deal with WHILE loops. The
> parse algorithms remain simple, while the grammar is written in a
> higher-level language.

> So I consider the rewrite rules purely as an implementation
> detail.

Agreed there.  The only requirement imposed by the ixml spec is that the
parser produce the correct results.  (And since the meaning of ixml
constructs is defined only by rather informal prose, and there is no
normative translation to BNF to connect it to any more formal accounts
of parsing, it is not always obvious what a phrase like "the correct
results" means in particular cases.)

> However, the real point of this article I think is in the
> conclusion, that some rewrites are faster than others, and
> implementers should experiment.

Yes.  If it were not for the fact that the choice of rewrite rules has
pragmatic consequences which have proved non-obvious to multiple
implementers, I don't think this document would exist.

> Two points that make the article harder to read:
>
> 1) The "|" separator for alternatives was an addition made to BNF from
> Chomsky's notation, and has by definition no inherent
> ordering. Alternative rewrites such as A: B | C. and A: C |
> B. confused me since they are equivalent.

What does "equivalent" mean here?

They recognize the same language, but they do not produce the same BNF.

All of the rewrites described here recognize the same language; each of
them produces a different BNF.

At the level of abstraction sometimes adopted in automata theory, where
the right-hand sides of a rule form a set, not a sequence, the two rules
in question are the same. But at that level of abstraction the rule "A:
B | B | C." is not ambiguous (unless "A: B|C." also is), though I
suspect most ixml grammars will say that it is.

> If there is a reason to give
> them, it should be explained, and it would be far better right at the
> beginning to say that there is such an equivalence, which can be
> applied everywhere. This would make a lot of the examples easier to
> read because there would be no need for this repetition.

Is it clear that a BNF produced using the rewrite "A: B|C." and one
produced using "A: C|B." will always have the same pragmatic
characteristics?  In particular: time to first parse? time to final
output?

It's not clear to me.

For a backtracking parser, the time to first parse may be quite
different for the two rewrites.  (Or so I speculate, on the basis of a
little experience with backtracking parsers.)  For an Earley parser,
probably not.

> 2) There is a implication (for instance in (4a) that bracketing is BNF
> ("The following BNF grammar ... S = ('a' | -S), 'b'.) However,
> bracketing is EBNF, and later examples of how to deal with it are
> given, so any rewrites using bracketing are only EBNF -> EBNF, and
> this should be made clearer in the text.

I think you mean O4a (with a capital O, not a zero), and thank you, good
catch.  I think the example should probably be rewritten as

  S = 'a', 'b' | -S, 'b'.

> Detailed Comments

> "textual insertions" This is an odd thing, because ixml grammars
> specify both input and output, and I don't consider insertions, or
> marks for that matter, as part of the grammar, but part of the output
> specification.

I think that in an ixml context, the term 'grammar' will for many
readers mean whatever is in the input grammar submitted to the
processor, and not just the subset of it that determines whether input
is accepted or not.  Your sentence illustrates the ambiguity: the first
occurrence of 'grammar' and the second occurrence have different senses.

> Therefore I don't think of them as extensions to EBNF
> in the sense suggested here. However, I do agree that repetition with
> separators are an addition to EBNF. "Nested choices". I think this is
> what I mean by bracketing, but in any case an example would help here.
>
> (O1a)(O1b) This is an example of giving the same expansion twice as if
> | isn't commutative. (PP2) The sentence is missing something.
>
> Steven


-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Monday, 15 April 2024 19:35:17 UTC