- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 6 Jan 2022 11:08:35 -0700
- To: Tomos Hillman <yamahito@gmail.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, Norm Tovey-Walsh <norm@saxonica.com>, ixml <public-ixml@w3.org>
> On 6,Jan2022, at 9:15 AM, Tomos Hillman <yamahito@gmail.com> wrote:
>
> do we actually have a situation where different parsers would disagree about whether the parse was ambiguous, without having to agree on what exactly the nature of that ambiguity is?
>
> Yes, we do. On the empty string, parsed against S = ‘a’*; ‘b’*.
>
> Steven’s parser and mine produce different results. His parser
> claims the string is ambiguous, because it implicitly defines
> ambiguity in one way (my conjecture: in terms of the ambiguity
> of the sentence against the underlying BNF grammar), and mine
> claims the string is unambiguous because it implicitly defines
> ambiguity a different way (number of raw parse trees, where a
> raw parse tree has nonterminals at the internal nodes, terminal
> symbols at preterminal nodes, and characters at the leaves).
> Where do your raw parse trees store the alternative possibilities for whether the empty string is denoted by `a*` or `b*`?
They don’t.
As I mentioned earlier, in my raw parse trees all nodes are (decorated
with) nonterminals, and neither ‘a’* nor ‘b’* is a nonterminal.
My adaptation of Earley parsing to EBNF treats ‘a’*; ‘b’* as
short-hand for a choice among
the empty sequence
‘a’
‘b’
‘aa’
‘bb’
‘aaa’
‘bbb’
…
This is in line with what Grune and Jacob call the iterative
interpretation of EBNF.
Operationally, I turn the right-hand side ‘a’* ; ‘b’’* into a
finite-state automaton with three states: q0, q2, and q3. The start
state is q0, the transition function is (q0, ‘a’, q1), (q0, ‘b’, q2),
(q1, ‘a’, q1), (q2, ‘b’, q2). The set of final states is {q0, q1,
q2}. The Earley item showing that we have successfully read an S at
position 0 has the form (0, 0, S, q0); it is recognized as a
completion item by the fact that q0 is among the final states for S.
No choice between 'a'* and 'b'* is recorded because none is made
(because none is needed).
Michael
Received on Thursday, 6 January 2022 18:09:00 UTC