- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 6 Jan 2022 12:54:54 -0700
- To: Tomos Hillman <yamahito@gmail.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, Norm Tovey-Walsh <norm@saxonica.com>, ixml <public-ixml@w3.org>

> On 6,Jan2022, at 9:52 AM, Tomos Hillman <yamahito@gmail.com> wrote: > > To put it a third way: what do you mean by “ambiguous in some way”? > It's a fair question, but not one that I feel I can answer to your satisfaction until I understand what it is that I don't understand! Otherwise we run the risk of my answer being intuitive but hazy... > > If I were to try, I would suggest that parsing a string against a grammar is ambiguous if there are multiple choices given by the grammar that result in any number of complete, valid parsed results. > I think you may be appealing here to > an intuitive but hazy notion that there are two different ways > to derive the empty string from the right-hand side ‘a’*; ‘b’*. > > I do think that: can you help me understand what makes it hazy? The fact that it appeals to a concept of ‘derivation’ which is not clearly defined. There *are* clear definitions of ‘derivation’ in the literature on parsing. None of them ever involve any operation on an expression like ‘a’*; ‘b’*, possibly because none of them applies to grammars in EBNF notation. When we need to apply notions derived for BNF grammars to EBNF grammars, we can of course start by waving our hands and saying ‘derivation’ with the meaning ‘like “derivation” for BNF, only extended in the obvious way to deal with EBNF’. That is I think not an unreasonable way to start. It’s one of the ways I started, when it became clear that we need a story about the ‘a’*; ‘b’* test case and similar cases. What I have found, in trying to make my hand-waving initial idea concrete and specific, is that there is no “obvious way” to extend the notion of derivation to EBNF. Every way I have tried has been a dead end. You may have better luck. The two best ways I am aware of for extending the concepts of derivation and ambiguity to EBNF grammars both seem to me profoundly non-obvious: instead of extending those terms to apply directly to EBNF, both the iterative and the recursive interpretations described by Grune and Jacobs amount to saying “take the EBNF, and generate an equivalent BNF; a sentence is derivable from the EBNF grammar if and only if it is derivable from the BNF grammar." And similarly for ambiguity and other terms. > Surely the fact that `;` denotes a choice, so that either choice may represent the empty string, means that there is more than one way of interpreting the input string from the grammar? Whether or not all of those interpretations render the same result? The definition of a nonterminal N defines a set of possible ways to rewrite N when deriving a sentence from the start symbol. If the empty sequence is an element in that set, it can be so only once. You describe ‘;’ as denoting a choice, but it can equally well be taken as denoting the set union of the possible rewrites in each alternative. Given the grammar S = ‘a’; ‘a’. is the sentence ’a’ ambiguous? This is not an EBNF grammar; this is straight BNF. So there is no need to rewrite the grammar to get an answer. And the answer is “no, not ambiguous; there is exactly one way to rewrite S when deriving a sentence from the start symbol; the fact that we have written it down twice is no more important than whether we write a given set as {1, 2} or {2, 1} or {1, 1, 1, 1, 2}." From this I infer that the presence of a semicolon in a right-hand side does not in itself justify the conclusion that there is ambiguity in the grammar if the left- and right-hand arguments of the semicolon denote overlapping sets of symbol sequences. Michael > > > PS: > As is often the case, I worry that you assume some common semantic definitions which we do not, in fact have in common (almost certainly due to my own ignorance)! "Sentential" has certainly thrown me, as I'm not sure what it means in this context beyond "relating to a sentence"... I am already prone to explaining things to people who did not ask for any explanation; I urge you not to encourage me in that habit. But since you did ask … Here, as sometimes happens, I am simply using standard terminology; any textbook on automata and formal language theory will define it. The most accessible account I know of is that in Grune and Jacobs, Parsing Techniques. The second edition is worth having, for those interested in the topic, but the first edition is still very useful as an introduction. At one point Grune made a PDF of the first edition available on his web site, though I don’t find it there at the moment. Grune and Jacob summarize the rules of rewrite systems very simply. They describe a rule as having a sequence of symbols on the left, a separator (‘->’ in their example), and a sequence of symbols (terminal or nonterminal) on the right, and a rule of the form X -> Y as meaning “X may be replaced by Y”. …. the rules are equally straightforward: start with the start symbol, and keep replacing until there are no nonterminals left. The sample grammar they use would be written this way in a version of ixml in which the left-hand side were not restricted to single nonterminals and the start symbol is marked by [gj:start]: {0.} Name = ’tom’; ‘dick’; ‘harry’. {1.} [gj:start] Sentence = Name; List, End. {2.} List = Name; Name, ‘, ‘, List. {3.} ‘, ’, Name, End = ‘ and ‘, Name. They then show how to generate sentences. Now let's generate our initial example from this grammar, using replacement according to the above rules only. We obtain the following successive forms for Sentence: Intermediate form (rule used -- explanation) Sentence ( -- the start symbol) List, End (Sentence = Name; List, End. -- rule 1) Name, ', ', List, End (List = Name, ‘, ‘, List. -- rule 2) Name, ', ', Name, ', ', List, End (List = Name, ‘, ‘, List. -- rule 2) Name, ', ', Name, ', ', Name, ', ', List, End (List = Name, ‘, ‘, List. -- rule 2) Name, ', ', Name, ' and ', Name (‘, ’, Name, End = ‘and ‘, Name. -- rule 3) 'tom', ', ', 'dick', ' and ', 'harry' (-- rule 0, three times) The intermediate forms are called *sentential forms*; if a sentential form contains no non-terminals it is called a *sentence* and belongs to the generated language. The transitions from one line to the next are called *production steps* and the rules are often called *production rules*, for obvious reasons. A terser definition: for a given context-free grammar G, the start symbol is a sentential form, and if Q is a sentential form containing the left-hand side of any rule R as a sub-sequence, then the sequence of symbols obtained by replacing that sub-sequence of Q with the sequence of symbols on the right-hand side of R s also a sentential form. Since the right-hand side of a rule in the production rules for which this definition holds is always a sequence of terminal and nonterminal symbols, it follow that every sentential form is a sequence of terminal and nonterminal symbols. There is no way to get anything else into a sentential form, and if you did get it in, there would be no way to get it back out. Hence my remark that 'a'*; 'b'* is not a possible sentential form. It is not a sequence of symbols, since '*' and ';' are neither terminal symbols of the grammar nor nonterminal symbols of the grammar.

Received on Thursday, 6 January 2022 19:55:16 UTC