Re: How is ambiguity defined? from C. M. Sperberg-McQueen on 2022-01-06 (public-ixml@w3.org from January 2022)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Thu, 6 Jan 2022 12:54:54 -0700
To: Tomos Hillman <yamahito@gmail.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, Norm Tovey-Walsh <norm@saxonica.com>, ixml <public-ixml@w3.org>
Message-Id: <7ADFE27F-E8AE-410F-8C43-4890F817C013@blackmesatech.com>
> On 6,Jan2022, at 9:52 AM, Tomos Hillman <yamahito@gmail.com> wrote:
> 
> To put it a third way: what do you mean by “ambiguous in some way”?
> It's a fair question, but not one that I feel I can answer to your satisfaction until I understand what it is that I don't understand!  Otherwise we run the risk of my answer being intuitive but hazy...
> 
> If I were to try, I would suggest that parsing a string against a grammar is ambiguous if there are multiple choices given by the grammar that result in any number of complete, valid parsed results.
> I think you may be appealing here to
> an intuitive but hazy notion that there are two different ways
> to derive the empty string from the right-hand side ‘a’*; ‘b’*.
> 
> I do think that: can you help me understand what makes it hazy?  

The fact that it appeals to a concept of ‘derivation’ which is
not clearly defined.

There *are* clear definitions of ‘derivation’ in the literature on
parsing.  None of them ever involve any operation on an expression
like ‘a’*; ‘b’*, possibly because none of them applies to grammars in
EBNF notation.  When we need to apply notions derived for BNF
grammars to EBNF grammars, we can of course start by waving our
hands and saying ‘derivation’ with the meaning ‘like “derivation” for
BNF, only extended in the obvious way to deal with EBNF’.  That is
I think not an unreasonable way to start.  It’s one of the ways I 
started, when it became clear that we need a story about the
‘a’*; ‘b’* test case and similar cases.

What I have found, in trying to make my hand-waving initial idea
concrete and specific, is that there is no “obvious way” to extend
the notion of derivation to EBNF.  Every way I have tried has 
been a dead end.  You may have better luck.

The two best ways I am aware of for extending the concepts of
derivation and ambiguity to EBNF grammars both seem to me
profoundly non-obvious:  instead of extending those terms to apply
directly to EBNF, both the iterative and the recursive interpretations
described by Grune and Jacobs amount to saying “take the EBNF,
and generate an equivalent BNF; a sentence is derivable from
the EBNF grammar if and only if it is derivable from the BNF grammar."
And similarly for ambiguity and other terms.  


> Surely the fact that `;` denotes a choice, so that either choice may represent the empty string, means that there is more than one way of interpreting the input string from the grammar?  Whether or not all of those interpretations render the same result?

The definition of a nonterminal N defines a set of possible ways to
rewrite N when deriving a sentence from the start symbol.  If the
empty sequence is an element in that set, it can be so only once.  You
describe ‘;’ as denoting a choice, but it can equally well be taken as
denoting the set union of the possible rewrites in each alternative.

Given the grammar 

   S = ‘a’; ‘a’.

is the sentence ’a’ ambiguous?  

This is not an EBNF grammar; this is straight BNF.  So there is no
need to rewrite the grammar to get an answer. And the answer is “no,
not ambiguous; there is exactly one way to rewrite S when deriving a
sentence from the start symbol; the fact that we have written it down
twice is no more important than whether we write a given set as {1, 2}
or {2, 1} or {1, 1, 1, 1, 2}."

From this I infer that the presence of a semicolon in a right-hand
side does not in itself justify the conclusion that there is ambiguity
in the grammar if the left- and right-hand arguments of the semicolon
denote overlapping sets of symbol sequences.

Michael

> 
> 
> PS:
> As is often the case, I worry that you assume some common semantic definitions which we do not, in fact have in common (almost certainly due to my own ignorance)!  "Sentential" has certainly thrown me, as I'm not sure what it means in this context beyond "relating to a sentence"...


I am already prone to explaining things to people who did not ask
for any explanation; I urge you not to encourage me in that habit.

But since you did ask …

Here, as sometimes happens, I am simply using standard terminology;
any textbook on automata and formal language theory will define it.

The most accessible account I know of is that in Grune and Jacobs,
Parsing Techniques.  The second edition is worth having, for those
interested in the topic, but the first edition is still very useful as
an introduction.  At one point Grune made a PDF of the first edition
available on his web site, though I don’t find it there at the moment.

Grune and Jacob summarize the rules of rewrite systems very simply.
They describe a rule as having a sequence of symbols on the left, a
separator (‘->’ in their example), and a sequence of symbols (terminal
or nonterminal) on the right, and a rule of the form X -> Y as meaning
“X may be replaced by Y”.

    …. the rules are equally straightforward: start with the start
    symbol, and keep replacing until there are no nonterminals left.

The sample grammar they use would be written this way in a version of
ixml in which the left-hand side were not restricted to single
nonterminals and the start symbol is marked by [gj:start]:

    {0.} Name = ’tom’; ‘dick’; ‘harry’.
    {1.} [gj:start] Sentence = Name; List, End. 
    {2.} List = Name; Name, ‘, ‘, List.
    {3.} ‘, ’, Name, End = ‘ and ‘, Name.

They then show how to generate sentences.

    Now let's generate our initial example from this grammar, using
    replacement according to the above rules only.  We obtain the
    following successive forms for Sentence:

    Intermediate form (rule used -- explanation)

    Sentence    ( -- the start symbol)
    List, End     (Sentence = Name; List, End. -- rule 1)
    Name, ', ', List, End     (List = Name, ‘, ‘, List. -- rule 2) 
    Name, ', ', Name, ', ', List, End     (List = Name, ‘, ‘, List. -- rule 2) 
    Name, ', ', Name, ', ', Name, ', ', List, End     (List = Name, ‘, ‘, List. -- rule 2) 
    Name, ', ', Name, ' and ', Name     (‘, ’, Name, End = ‘and ‘, Name. -- rule 3) 
    'tom', ', ', 'dick', ' and ', 'harry'   (-- rule 0, three times)

    The intermediate forms are called *sentential forms*; if a
    sentential form contains no non-terminals it is called a
    *sentence* and belongs to the generated language.  The transitions
    from one line to the next are called *production steps* and the
    rules are often called *production rules*, for obvious reasons.

A terser definition: for a given context-free grammar G, the start
symbol is a sentential form, and if Q is a sentential form containing
the left-hand side of any rule R as a sub-sequence, then the sequence
of symbols obtained by replacing that sub-sequence of Q with the
sequence of symbols on the right-hand side of R s also a sentential
form.

Since the right-hand side of a rule in the production rules for which
this definition holds is always a sequence of terminal and nonterminal
symbols, it follow that every sentential form is a sequence of
terminal and nonterminal symbols.  There is no way to get anything
else into a sentential form, and if you did get it in, there would be
no way to get it back out.  Hence my remark that

    'a'*; 'b'*

is not a possible sentential form.  It is not a sequence of symbols,
since '*' and ';' are neither terminal symbols of the grammar nor
nonterminal symbols of the grammar.
Received on Thursday, 6 January 2022 19:55:16 UTC