Re: Comments on Converting EBNF to BNF from Steven Pemberton on 2024-04-16 (public-ixml@w3.org from April 2024)

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Tue, 16 Apr 2024 12:00:45 +0000
To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Cc: public-ixml@w3.org
Message-Id: <1713268653228.525226716.2588247042@cwi.nl>
Ooh, I'm sorry Michael. I know my comments on documents can be a bit terse. 
I hope they didn't sound unfriendly; it wasn't the intention.

I saw your email come in just before I switched off for bed. I'm glad I 
didn't read it then. Maybe I won't read it at all...

Steven

On Tuesday 16 April 2024 02:21:21 (+02:00), C. M. Sperberg-McQueen wrote:

 > My comments, in turn, elicit a further remark or two.
 > 
 > Re-reading what I sent earlier today, I am a little disappointed by the
 > tetchiness of my response to Steven.  My apologies -- I was on edge when
 > I wrote, for other reasons, and it seems to have had an unfortunate
 > effect.
 > 
 > It's clear that the current framing of the document struck Steven as
 > slightly off and maybe a little off-putting -- good reason to re-think
 > the framing of the document.  The same is true for the details Steven
 > commented on.  That is, when viewed correctly, very helpful feedback.
 > I'll need to think about how to address the comments, but that's not a
 > reason to wave them off.
 > 
 > Thank you, Steven.  I appreciate the detailed and helpful response.  I
 > apologize for my earlier defensiveness.
 > 
 > Michael
 > 
 > 
 > "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com> writes:
 > 
 > > Steven's comments are welcome; they elicit a few comments in return.
 > >
 > > Steven Pemberton <steven.pemberton@cwi.nl> writes:
 > >
 > >> https://github.com/invisibleXML/ixml/blob/master/misc/ebnf-to-bnf.md
 > >>
 > >> General comments:
 > >>
 > >> I appreciate the approach of this article, though it does reflect a
 > >> different idea to mine of what is behind rewriting rules.
 > >>
 > >> Parsing algorithms don't need to deal with EBNF extensions, because
 > >> all the extensions are only syntactic differences intended to make
 > >> life easier for the author and reader, and all grammars written in
 > >> EBNF can be rewritten, as this article shows, to equivalent BNF 
forms.
 > >
 > > This all seems true enough; I'm not quite sure whether it reflects a
 > > different idea about what is behind rewriting rules or not.
 > >
 > > What would it mean to say that parsing algorithms "need to" deal with
 > > EBNF?  Published algorithms assume grammars in a particular form,
 > > typically BNF or something isomorphic to it.  Nothing we can say or do
 > > will make the editors of those journals return the manuacripts to the
 > > authors asking that they revise the article to account for a different
 > > input format for grammars.
 > >
 > > For a potential implementer of ixml, on the other hand, it seems
 > > passably clear (a) what it means to "deal with" EBNF, and (b) that 
they
 > > "need" to do so.  Rewriting the input grammar into BNF and using a
 > > published algorithm for parsing BNF is one way they can perform their
 > > task.  Perhaps I am unnecessarily pedantic on this point, but 
rewriting
 > > to BNF is not the only possible approach to implementing ixml.
 > >
 > >> Accordingly, I consider the rewriting as equivalent to code
 > >> generation, and is indeed how I treat it in my implementation.
 > >
 > > I'm puzzled here in part because in past discussions of ambiguity you
 > > have denied that you rewrite the grammar, or that the way in which you
 > > parse is equivalent to parsing against a particular BNF equivalent to
 > > the input grammar.
 > >
 > >> So
 > >> parse algorithms don't need to deal with EBNF constructs for the same
 > >> reason that machine opcodes don't have to deal with WHILE loops. The
 > >> parse algorithms remain simple, while the grammar is written in a
 > >> higher-level language.
 > >
 > >> So I consider the rewrite rules purely as an implementation
 > >> detail.
 > >
 > > Agreed there.  The only requirement imposed by the ixml spec is that 
the
 > > parser produce the correct results.  (And since the meaning of ixml
 > > constructs is defined only by rather informal prose, and there is no
 > > normative translation to BNF to connect it to any more formal accounts
 > > of parsing, it is not always obvious what a phrase like "the correct
 > > results" means in particular cases.)
 > >
 > >> However, the real point of this article I think is in the
 > >> conclusion, that some rewrites are faster than others, and
 > >> implementers should experiment.
 > >
 > > Yes.  If it were not for the fact that the choice of rewrite rules has
 > > pragmatic consequences which have proved non-obvious to multiple
 > > implementers, I don't think this document would exist.
 > >
 > >> Two points that make the article harder to read:
 > >>
 > >> 1) The "|" separator for alternatives was an addition made to BNF 
from
 > >> Chomsky's notation, and has by definition no inherent
 > >> ordering. Alternative rewrites such as A: B | C. and A: C |
 > >> B. confused me since they are equivalent.
 > >
 > > What does "equivalent" mean here?
 > >
 > > They recognize the same language, but they do not produce the same 
BNF.
 > >
 > > All of the rewrites described here recognize the same language; each 
of
 > > them produces a different BNF.
 > >
 > > At the level of abstraction sometimes adopted in automata theory, 
where
 > > the right-hand sides of a rule form a set, not a sequence, the two 
rules
 > > in question are the same. But at that level of abstraction the rule 
"A:
 > > B | B | C." is not ambiguous (unless "A: B|C." also is), though I
 > > suspect most ixml grammars will say that it is.
 > >
 > >> If there is a reason to give
 > >> them, it should be explained, and it would be far better right at the
 > >> beginning to say that there is such an equivalence, which can be
 > >> applied everywhere. This would make a lot of the examples easier to
 > >> read because there would be no need for this repetition.
 > >
 > > Is it clear that a BNF produced using the rewrite "A: B|C." and one
 > > produced using "A: C|B." will always have the same pragmatic
 > > characteristics?  In particular: time to first parse? time to final
 > > output?
 > >
 > > It's not clear to me.
 > >
 > > For a backtracking parser, the time to first parse may be quite
 > > different for the two rewrites.  (Or so I speculate, on the basis of a
 > > little experience with backtracking parsers.)  For an Earley parser,
 > > probably not.
 > >
 > >> 2) There is a implication (for instance in (4a) that bracketing is 
BNF
 > >> ("The following BNF grammar ... S = ('a' | -S), 'b'.) However,
 > >> bracketing is EBNF, and later examples of how to deal with it are
 > >> given, so any rewrites using bracketing are only EBNF -> EBNF, and
 > >> this should be made clearer in the text.
 > >
 > > I think you mean O4a (with a capital O, not a zero), and thank you, 
good
 > > catch.  I think the example should probably be rewritten as
 > >
 > >   S = 'a', 'b' | -S, 'b'.
 > >
 > >> Detailed Comments
 > >
 > >> "textual insertions" This is an odd thing, because ixml grammars
 > >> specify both input and output, and I don't consider insertions, or
 > >> marks for that matter, as part of the grammar, but part of the output
 > >> specification.
 > >
 > > I think that in an ixml context, the term 'grammar' will for many
 > > readers mean whatever is in the input grammar submitted to the
 > > processor, and not just the subset of it that determines whether input
 > > is accepted or not.  Your sentence illustrates the ambiguity: the 
first
 > > occurrence of 'grammar' and the second occurrence have different 
senses.
 > >
 > >> Therefore I don't think of them as extensions to EBNF
 > >> in the sense suggested here. However, I do agree that repetition with
 > >> separators are an addition to EBNF. "Nested choices". I think this is
 > >> what I mean by bracketing, but in any case an example would help 
here.
 > >>
 > >> (O1a)(O1b) This is an example of giving the same expansion twice as 
if
 > >> | isn't commutative. (PP2) The sentence is missing something.
 > >>
 > >> Steven
 > 
 >
Received on Tuesday, 16 April 2024 12:00:58 UTC