Re: Comments on Converting EBNF to BNF

My comments, in turn, elicit a further remark or two.

Re-reading what I sent earlier today, I am a little disappointed by the
tetchiness of my response to Steven.  My apologies -- I was on edge when
I wrote, for other reasons, and it seems to have had an unfortunate
effect.

It's clear that the current framing of the document struck Steven as
slightly off and maybe a little off-putting -- good reason to re-think
the framing of the document.  The same is true for the details Steven
commented on.  That is, when viewed correctly, very helpful feedback.
I'll need to think about how to address the comments, but that's not a
reason to wave them off.

Thank you, Steven.  I appreciate the detailed and helpful response.  I
apologize for my earlier defensiveness.

Michael


"C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com> writes:

> Steven's comments are welcome; they elicit a few comments in return.
>
> Steven Pemberton <steven.pemberton@cwi.nl> writes:
>
>> https://github.com/invisibleXML/ixml/blob/master/misc/ebnf-to-bnf.md
>>
>> General comments:
>>
>> I appreciate the approach of this article, though it does reflect a
>> different idea to mine of what is behind rewriting rules.
>>
>> Parsing algorithms don't need to deal with EBNF extensions, because
>> all the extensions are only syntactic differences intended to make
>> life easier for the author and reader, and all grammars written in
>> EBNF can be rewritten, as this article shows, to equivalent BNF forms.
>
> This all seems true enough; I'm not quite sure whether it reflects a
> different idea about what is behind rewriting rules or not.
>
> What would it mean to say that parsing algorithms "need to" deal with
> EBNF?  Published algorithms assume grammars in a particular form,
> typically BNF or something isomorphic to it.  Nothing we can say or do
> will make the editors of those journals return the manuacripts to the
> authors asking that they revise the article to account for a different
> input format for grammars.
>
> For a potential implementer of ixml, on the other hand, it seems
> passably clear (a) what it means to "deal with" EBNF, and (b) that they
> "need" to do so.  Rewriting the input grammar into BNF and using a
> published algorithm for parsing BNF is one way they can perform their
> task.  Perhaps I am unnecessarily pedantic on this point, but rewriting
> to BNF is not the only possible approach to implementing ixml.
>
>> Accordingly, I consider the rewriting as equivalent to code
>> generation, and is indeed how I treat it in my implementation.
>
> I'm puzzled here in part because in past discussions of ambiguity you
> have denied that you rewrite the grammar, or that the way in which you
> parse is equivalent to parsing against a particular BNF equivalent to
> the input grammar.
>
>> So
>> parse algorithms don't need to deal with EBNF constructs for the same
>> reason that machine opcodes don't have to deal with WHILE loops. The
>> parse algorithms remain simple, while the grammar is written in a
>> higher-level language.
>
>> So I consider the rewrite rules purely as an implementation
>> detail.
>
> Agreed there.  The only requirement imposed by the ixml spec is that the
> parser produce the correct results.  (And since the meaning of ixml
> constructs is defined only by rather informal prose, and there is no
> normative translation to BNF to connect it to any more formal accounts
> of parsing, it is not always obvious what a phrase like "the correct
> results" means in particular cases.)
>
>> However, the real point of this article I think is in the
>> conclusion, that some rewrites are faster than others, and
>> implementers should experiment.
>
> Yes.  If it were not for the fact that the choice of rewrite rules has
> pragmatic consequences which have proved non-obvious to multiple
> implementers, I don't think this document would exist.
>
>> Two points that make the article harder to read:
>>
>> 1) The "|" separator for alternatives was an addition made to BNF from
>> Chomsky's notation, and has by definition no inherent
>> ordering. Alternative rewrites such as A: B | C. and A: C |
>> B. confused me since they are equivalent.
>
> What does "equivalent" mean here?
>
> They recognize the same language, but they do not produce the same BNF.
>
> All of the rewrites described here recognize the same language; each of
> them produces a different BNF.
>
> At the level of abstraction sometimes adopted in automata theory, where
> the right-hand sides of a rule form a set, not a sequence, the two rules
> in question are the same. But at that level of abstraction the rule "A:
> B | B | C." is not ambiguous (unless "A: B|C." also is), though I
> suspect most ixml grammars will say that it is.
>
>> If there is a reason to give
>> them, it should be explained, and it would be far better right at the
>> beginning to say that there is such an equivalence, which can be
>> applied everywhere. This would make a lot of the examples easier to
>> read because there would be no need for this repetition.
>
> Is it clear that a BNF produced using the rewrite "A: B|C." and one
> produced using "A: C|B." will always have the same pragmatic
> characteristics?  In particular: time to first parse? time to final
> output?
>
> It's not clear to me.
>
> For a backtracking parser, the time to first parse may be quite
> different for the two rewrites.  (Or so I speculate, on the basis of a
> little experience with backtracking parsers.)  For an Earley parser,
> probably not.
>
>> 2) There is a implication (for instance in (4a) that bracketing is BNF
>> ("The following BNF grammar ... S = ('a' | -S), 'b'.) However,
>> bracketing is EBNF, and later examples of how to deal with it are
>> given, so any rewrites using bracketing are only EBNF -> EBNF, and
>> this should be made clearer in the text.
>
> I think you mean O4a (with a capital O, not a zero), and thank you, good
> catch.  I think the example should probably be rewritten as
>
>   S = 'a', 'b' | -S, 'b'.
>
>> Detailed Comments
>
>> "textual insertions" This is an odd thing, because ixml grammars
>> specify both input and output, and I don't consider insertions, or
>> marks for that matter, as part of the grammar, but part of the output
>> specification.
>
> I think that in an ixml context, the term 'grammar' will for many
> readers mean whatever is in the input grammar submitted to the
> processor, and not just the subset of it that determines whether input
> is accepted or not.  Your sentence illustrates the ambiguity: the first
> occurrence of 'grammar' and the second occurrence have different senses.
>
>> Therefore I don't think of them as extensions to EBNF
>> in the sense suggested here. However, I do agree that repetition with
>> separators are an addition to EBNF. "Nested choices". I think this is
>> what I mean by bracketing, but in any case an example would help here.
>>
>> (O1a)(O1b) This is an example of giving the same expansion twice as if
>> | isn't commutative. (PP2) The sentence is missing something.
>>
>> Steven


-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Tuesday, 16 April 2024 00:32:43 UTC