- From: Steven Pemberton <steven.pemberton@cwi.nl>
- Date: Tue, 16 Apr 2024 12:00:45 +0000
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: public-ixml@w3.org
Ooh, I'm sorry Michael. I know my comments on documents can be a bit terse. I hope they didn't sound unfriendly; it wasn't the intention. I saw your email come in just before I switched off for bed. I'm glad I didn't read it then. Maybe I won't read it at all... Steven On Tuesday 16 April 2024 02:21:21 (+02:00), C. M. Sperberg-McQueen wrote: > My comments, in turn, elicit a further remark or two. > > Re-reading what I sent earlier today, I am a little disappointed by the > tetchiness of my response to Steven. My apologies -- I was on edge when > I wrote, for other reasons, and it seems to have had an unfortunate > effect. > > It's clear that the current framing of the document struck Steven as > slightly off and maybe a little off-putting -- good reason to re-think > the framing of the document. The same is true for the details Steven > commented on. That is, when viewed correctly, very helpful feedback. > I'll need to think about how to address the comments, but that's not a > reason to wave them off. > > Thank you, Steven. I appreciate the detailed and helpful response. I > apologize for my earlier defensiveness. > > Michael > > > "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com> writes: > > > Steven's comments are welcome; they elicit a few comments in return. > > > > Steven Pemberton <steven.pemberton@cwi.nl> writes: > > > >> https://github.com/invisibleXML/ixml/blob/master/misc/ebnf-to-bnf.md > >> > >> General comments: > >> > >> I appreciate the approach of this article, though it does reflect a > >> different idea to mine of what is behind rewriting rules. > >> > >> Parsing algorithms don't need to deal with EBNF extensions, because > >> all the extensions are only syntactic differences intended to make > >> life easier for the author and reader, and all grammars written in > >> EBNF can be rewritten, as this article shows, to equivalent BNF forms. > > > > This all seems true enough; I'm not quite sure whether it reflects a > > different idea about what is behind rewriting rules or not. > > > > What would it mean to say that parsing algorithms "need to" deal with > > EBNF? Published algorithms assume grammars in a particular form, > > typically BNF or something isomorphic to it. Nothing we can say or do > > will make the editors of those journals return the manuacripts to the > > authors asking that they revise the article to account for a different > > input format for grammars. > > > > For a potential implementer of ixml, on the other hand, it seems > > passably clear (a) what it means to "deal with" EBNF, and (b) that they > > "need" to do so. Rewriting the input grammar into BNF and using a > > published algorithm for parsing BNF is one way they can perform their > > task. Perhaps I am unnecessarily pedantic on this point, but rewriting > > to BNF is not the only possible approach to implementing ixml. > > > >> Accordingly, I consider the rewriting as equivalent to code > >> generation, and is indeed how I treat it in my implementation. > > > > I'm puzzled here in part because in past discussions of ambiguity you > > have denied that you rewrite the grammar, or that the way in which you > > parse is equivalent to parsing against a particular BNF equivalent to > > the input grammar. > > > >> So > >> parse algorithms don't need to deal with EBNF constructs for the same > >> reason that machine opcodes don't have to deal with WHILE loops. The > >> parse algorithms remain simple, while the grammar is written in a > >> higher-level language. > > > >> So I consider the rewrite rules purely as an implementation > >> detail. > > > > Agreed there. The only requirement imposed by the ixml spec is that the > > parser produce the correct results. (And since the meaning of ixml > > constructs is defined only by rather informal prose, and there is no > > normative translation to BNF to connect it to any more formal accounts > > of parsing, it is not always obvious what a phrase like "the correct > > results" means in particular cases.) > > > >> However, the real point of this article I think is in the > >> conclusion, that some rewrites are faster than others, and > >> implementers should experiment. > > > > Yes. If it were not for the fact that the choice of rewrite rules has > > pragmatic consequences which have proved non-obvious to multiple > > implementers, I don't think this document would exist. > > > >> Two points that make the article harder to read: > >> > >> 1) The "|" separator for alternatives was an addition made to BNF from > >> Chomsky's notation, and has by definition no inherent > >> ordering. Alternative rewrites such as A: B | C. and A: C | > >> B. confused me since they are equivalent. > > > > What does "equivalent" mean here? > > > > They recognize the same language, but they do not produce the same BNF. > > > > All of the rewrites described here recognize the same language; each of > > them produces a different BNF. > > > > At the level of abstraction sometimes adopted in automata theory, where > > the right-hand sides of a rule form a set, not a sequence, the two rules > > in question are the same. But at that level of abstraction the rule "A: > > B | B | C." is not ambiguous (unless "A: B|C." also is), though I > > suspect most ixml grammars will say that it is. > > > >> If there is a reason to give > >> them, it should be explained, and it would be far better right at the > >> beginning to say that there is such an equivalence, which can be > >> applied everywhere. This would make a lot of the examples easier to > >> read because there would be no need for this repetition. > > > > Is it clear that a BNF produced using the rewrite "A: B|C." and one > > produced using "A: C|B." will always have the same pragmatic > > characteristics? In particular: time to first parse? time to final > > output? > > > > It's not clear to me. > > > > For a backtracking parser, the time to first parse may be quite > > different for the two rewrites. (Or so I speculate, on the basis of a > > little experience with backtracking parsers.) For an Earley parser, > > probably not. > > > >> 2) There is a implication (for instance in (4a) that bracketing is BNF > >> ("The following BNF grammar ... S = ('a' | -S), 'b'.) However, > >> bracketing is EBNF, and later examples of how to deal with it are > >> given, so any rewrites using bracketing are only EBNF -> EBNF, and > >> this should be made clearer in the text. > > > > I think you mean O4a (with a capital O, not a zero), and thank you, good > > catch. I think the example should probably be rewritten as > > > > S = 'a', 'b' | -S, 'b'. > > > >> Detailed Comments > > > >> "textual insertions" This is an odd thing, because ixml grammars > >> specify both input and output, and I don't consider insertions, or > >> marks for that matter, as part of the grammar, but part of the output > >> specification. > > > > I think that in an ixml context, the term 'grammar' will for many > > readers mean whatever is in the input grammar submitted to the > > processor, and not just the subset of it that determines whether input > > is accepted or not. Your sentence illustrates the ambiguity: the first > > occurrence of 'grammar' and the second occurrence have different senses. > > > >> Therefore I don't think of them as extensions to EBNF > >> in the sense suggested here. However, I do agree that repetition with > >> separators are an addition to EBNF. "Nested choices". I think this is > >> what I mean by bracketing, but in any case an example would help here. > >> > >> (O1a)(O1b) This is an example of giving the same expansion twice as if > >> | isn't commutative. (PP2) The sentence is missing something. > >> > >> Steven > >
Received on Tuesday, 16 April 2024 12:00:58 UTC