Re: How is ambiguity defined? from C. M. Sperberg-McQueen on 2022-01-06 (public-ixml@w3.org from January 2022)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Thu, 6 Jan 2022 11:08:35 -0700
To: Tomos Hillman <yamahito@gmail.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, Norm Tovey-Walsh <norm@saxonica.com>, ixml <public-ixml@w3.org>
Message-Id: <7FC359CC-6CBD-41B4-9BBE-925DC3574C3C@blackmesatech.com>

> On 6,Jan2022, at 9:15 AM, Tomos Hillman <yamahito@gmail.com> wrote:
> 
> do we actually have a situation where different parsers would disagree about whether the parse was ambiguous, without having to agree on what exactly the nature of that ambiguity is?
> 
> Yes, we do. On the empty string, parsed against S = ‘a’*; ‘b’*.
> 
> Steven’s parser and mine produce different results. His parser
> claims the string is ambiguous, because it implicitly defines
> ambiguity in one way (my conjecture: in terms of the ambiguity
> of the sentence against the underlying BNF grammar), and mine
> claims the string is unambiguous because it implicitly defines
> ambiguity a different way (number of raw parse trees, where a
> raw parse tree has nonterminals at the internal nodes, terminal
> symbols at preterminal nodes, and characters at the leaves).

> Where do your raw parse trees store the alternative possibilities for whether the empty string is denoted by `a*` or `b*`? 

They don’t.

As I mentioned earlier, in my raw parse trees all nodes are (decorated
with) nonterminals, and neither ‘a’* nor ‘b’* is a nonterminal.

My adaptation of Earley parsing to EBNF treats ‘a’*; ‘b’* as
short-hand for a choice among

    the empty sequence
    ‘a’
    ‘b’
    ‘aa’
    ‘bb’
    ‘aaa’
    ‘bbb’ 
    …

This is in line with what Grune and Jacob call the iterative
interpretation of EBNF.

Operationally, I turn the right-hand side ‘a’* ; ‘b’’* into a
finite-state automaton with three states: q0, q2, and q3.  The start
state is q0, the transition function is (q0, ‘a’, q1), (q0, ‘b’, q2),
(q1, ‘a’, q1), (q2, ‘b’, q2).  The set of final states is {q0, q1,
q2}.  The Earley item showing that we have successfully read an S at
position 0 has the form (0, 0, S, q0); it is recognized as a
completion item by the fact that q0 is among the final states for S.

No choice between 'a'* and 'b'* is recorded because none is made
(because none is needed).


Michael

Received on Thursday, 6 January 2022 18:09:00 UTC