Re: How is ambiguity defined?

I'm not sure I understand approach B, or how it avoids ambiguity within the non-terminal definition: there may be only one parse tree, but surely the fact that it contains a choice means that it is itself ambiguous in some way?

I also seem to recall (unreliably) that implementations are not bound to return all possible ambiguous results, only to note that there is an ambiguity: do we actually have a situation where different parsers would disagree about whether the parse was ambiguous, without having to agree on what exactly the nature of that ambiguity is?  Or do we have to say that it is implementation dependent on whether or not ambiguities are detected?

Thanks,
Tom
On 5 Jan 2022, 21:00 +0000, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>, wrote:
> On 5,Jan2022, at 5:35 AM, Norm Tovey-Walsh <norm@saxonica.com> wrote:
> >
> > Apologies for what may be a naive question.
> >
> > Consider this grammar:
> >
> > lnnl: "a", number, number, "a" .
> > -number: oddnumber ; evennumber .
> > -oddnumber: "1" ; .
> > -evennumber: "2" ; .
> >
> > The grammar is ambiguous in that there are multiple nonterminals that
> > can match nothing.
> >
> > But there is only one possible parse for “a12a”. I assume that we don’t
> > report any ambiguity in this case.
> >
> > There are several possible parses for “aa”, but they’ll all produce
> > exactly the same XML serialization. Does that count as ambiguous?
>
> On the hunch that this kind of question is best handled by defining or
> clarifying terms, I have just checked my shelf for definitions of
> ambiguity by people who know what they are doing.
>
> The short answer is: They all agree, and they leave us with the task
> of defining what ambiguity means for our spec. There is no
> off-the-shelf answer for our situation.
>
> Longer answer:
>
> First, how do they define ambiguity?
>
> György E. Révész, Introduction to formal languages (New York:
> McGraw-Hill, 1983, rpt. Mineola, NY: Dover, 1991) says (p. 138):
>
> For a context-free Grammar G, a word P in L(G) is said to be
> ambiguous iff P has two or more derivation trees (leftmost
> derivations) in G.
>
> John E. Hopcroft and Jeffrey D. Ulllman, Introduction to automata
> theory, languages, and computation (Reading, MA: Addison-Wesley,
> 19790, say (p. 87):
>
> A context-free grammar G such that some word has two parse trees
> is said to be *ambiguous*. From what we have said above, an
> equivalent definition of ambiguity is that some word has more than
> one leftmost derivation or more than one rightmost derivation.
>
> Dick Grune and Ceriel J. H. Jacobs, Parsing techniques: A practical
> guide (New York: Ellis Horwood, 1990, 2d ed. New York: Springer,
> 2008), say (p. 62 of 1st ed., p. 63 of 2d ed.)
>
> Not surprisingly, a sentence with more than one production tree is
> called *ambiguous*, but we must immediately distinguish between
> *essential ambigurity* and *spurious ambiguity*. ... An ambiguous
> sentence is spuriously ambiguous if all its production trees
> describe the same semantics; if some of them differ in their
> semantics, the ambiguity is essential.
>
> Some things seem worth noting.
>
> Révésc and Hopcroft/Ullman explicitly define ambiguity with respect to
> a grammar / string pair. This is not surprising, since a sentence may
> have multiple parse trees for one grammar and only one parse tree for
> a different grammar that defines the same language.
>
> All three books define ambiguity in terms of trees, not directly of
> derivations.
>
> All three books treat trees as effectively a visualization of a
> derivation, with one node for each derivation step. In their
> accounts, any derivation gives rise to exactly one tree, any tree has
> one leftmost and one rightmost derivation, and in every derivation
> step exactly one nonterminal is rewritten by being replaced by its
> right-hand side.
>
> I won't quote the wording, but in all three books, derivation or
> production graphs are described as using nonterminals to label nodes.
>
> The problem is that for us, these don't all hold. I'll take the
> example that came up the other day, since it's shorter than Norm's
> example:
>
> S = 'a'* | 'b'*.
>
> The empty string '' is a sentence in this language. Is the empty
> string ambiguous under this grammar? How many ways are there to
> derive it? How many parse trees does it have?
>
> We have not defined how to derive a string from an ixml grammar, and
> the answers will vary depending on what we decide.
>
> Approach A: BNF
>
> One approach is to rewrite the ixml grammar to BNF and use what the
> formal-language theorists give us. If we use the rewrites in the
> spec, we get
>
> S = a-star; b-star.
> -a-star: 'a', a-star; .
> -b-star: 'b', b-star; .
>
> and this is clearly ambiguous: we can derive the empty string either
> via a-star or via b-star.
>
> However, this is not the grammar we started with, and ambiguity is not
> defined with respect to the set of all possible grammars for a
> language, but with respect to one particular grammar.
>
>
> Approach B: EBNF (derivation by nonterminals)
>
> If parse trees are labeled with nonterminals, then there is only one
> parse tree for the empty string in this grammar: <S></S>. There is no
> nonterminal defined as 'a'*, no nonterminal for 'b'*, so they don't go
> into the parse tree.
>
>
> Approach C: EBNF (derivation by expressions)
>
> We could say there are obviously two ways to derive the sentence in
> the EBNF:
>
> 1 S
> 2 'a'*
> 3 ''
>
> and another is
>
> 1 S
> 2 'b'*
> 3 ''
>
> The parse trees for this grammar would, on this account, be labeled
> not with nonterminals but with expressions of any kind -- here a
> repeat-0 over a terminal symbol.
>
> The problem is that neither we nor anyone else have offered a formal
> definition of derivation from an EBNF grammar. So we could equally
> well take a different approach.
>
>
> In their section 2.3.2.4 (p. 28 of 2d ed; 2.3.2.3 on p. 34 of 1st
> ed.), Grune and Jacobs describe our problem concisely without
> resolving it in a way that binds us.
>
> There are two different schools of thought about the structural
> meaning of a regular right-hand side. One school maintains that a
> rule like
>
> Book --> Preface Chapter+ Conclusion
>
> is an abbreviation of
>
> Book --> Preface α Conclusion
> α --> Chapter | Chapter α
>
> as shown above. This is the "(right)recursive" interpretation.
> It has the advantage that it is easy to explain and that the
> transformatoin to "normal" CF is simple. Disadvantages are that
> the transformation entails anonymous rules (identified by α here)
> and that the lopsided production tree, for, for example, a book of
> four chapters does not correspond to our idea of the structure of
> the book; ...
>
> The second school claims that
>
> Book --> Preface Chapter+ Conclusion
>
> is an abbreviation of
>
> Book --> Preface Chapter Conclusion
> | Preface Chapter Chapter Conclusion
> | Preface Chapter Chapter Chapter Conclusion
> | ...
> ...
>
> This is the "iterative" interpretation. It has the advantage that
> it yields a beautiful production tree ..., but the disadvantages
> are that it involves an infinite number of production rules and
> that the nodes in the production tree have a varying fan-out.
>
> Since the implementation of the iterative interpretation is far
> from trivial, most practical parser generators use the recursive
> interpretation in some form or another, whereas most research has
> been done on the iterative interpretation.
>
> Hot dog: when I figured out how to apply the Earley algorithm directly
> to EBNF grammars I apparently solved a non-trivial problem. Cool!
>
> It will be seen that approaches A and C described above take the
> recursive interpretation, and B the iterative interpretation.
>
> For implementors, it looks to me as if it's easiest to detect
> ambiguities by looking at what might be called the raw parse tree:
> the parse tree you get if you ignore all marks. One complication is
> that since some implementations rewrite the grammar (and this is
> implicitly allowed by our Hints to Implementors section), not all
> processors are going to produce the same raw parse tree.
>
> For the user, I think Grune and Jacobs's disinction between essential
> ambiguity and spurious ambiguity is likely to be the distinction
> between ambiguities that produce more than one XML form and
> ambiguities that don't.
>
> What tradeoff do we want between simplicity for implementors
> and simplicity for users?
>
>
> At the moment, every answer I see to Norm's question looks
> unacceptable.
>
> Michael
>
>

Received on Thursday, 6 January 2022 10:41:04 UTC