- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 6 Jan 2022 11:08:35 -0700
- To: Tomos Hillman <yamahito@gmail.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, Norm Tovey-Walsh <norm@saxonica.com>, ixml <public-ixml@w3.org>
> On 6,Jan2022, at 9:15 AM, Tomos Hillman <yamahito@gmail.com> wrote: > > do we actually have a situation where different parsers would disagree about whether the parse was ambiguous, without having to agree on what exactly the nature of that ambiguity is? > > Yes, we do. On the empty string, parsed against S = ‘a’*; ‘b’*. > > Steven’s parser and mine produce different results. His parser > claims the string is ambiguous, because it implicitly defines > ambiguity in one way (my conjecture: in terms of the ambiguity > of the sentence against the underlying BNF grammar), and mine > claims the string is unambiguous because it implicitly defines > ambiguity a different way (number of raw parse trees, where a > raw parse tree has nonterminals at the internal nodes, terminal > symbols at preterminal nodes, and characters at the leaves). > Where do your raw parse trees store the alternative possibilities for whether the empty string is denoted by `a*` or `b*`? They don’t. As I mentioned earlier, in my raw parse trees all nodes are (decorated with) nonterminals, and neither ‘a’* nor ‘b’* is a nonterminal. My adaptation of Earley parsing to EBNF treats ‘a’*; ‘b’* as short-hand for a choice among the empty sequence ‘a’ ‘b’ ‘aa’ ‘bb’ ‘aaa’ ‘bbb’ … This is in line with what Grune and Jacob call the iterative interpretation of EBNF. Operationally, I turn the right-hand side ‘a’* ; ‘b’’* into a finite-state automaton with three states: q0, q2, and q3. The start state is q0, the transition function is (q0, ‘a’, q1), (q0, ‘b’, q2), (q1, ‘a’, q1), (q2, ‘b’, q2). The set of final states is {q0, q1, q2}. The Earley item showing that we have successfully read an S at position 0 has the form (0, 0, S, q0); it is recognized as a completion item by the fact that q0 is among the final states for S. No choice between 'a'* and 'b'* is recorded because none is made (because none is needed). Michael
Received on Thursday, 6 January 2022 18:09:00 UTC