resolving ambiguity from C. M. Sperberg-McQueen on 2021-04-14 (public-ixml@w3.org from April 2021)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Tue, 13 Apr 2021 18:42:00 -0600
To: public-ixml@w3.org
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Message-Id: <4C762324-2C76-4552-B8CA-592720869F28@blackmesatech.com>

On today’s call, Tom suggested that the spec might usefully say which tree should be returned, in case of ambiguity. 

I am nervous about this suggestion; I think that the choice of tree is better left open.  

"First rule wins" seems plausible enough when it applies, but I have not thought enough about it to know whether it always applies, whether it always provides a unique solution, or how to extend the Earley algorithm (or any of the other general algorithms) to keep track of such things in a way that would guarantee the selection of the correct tree.  Consider this grammar:

    S: Left, Right.
    Left:  "a", "b"; "a".
    Right: "b", "c"; "c".

If I'm not mistaken, this has two parse trees:

    <S><Left>ab</Left><Right>c</Right></S>
    <S><Left>a</Left><Right>bc</Right></S>

The first tree uses the first RHS for Left and the second RHS for Right.  The second tree is the other way round.  Does the "first rule wins" principle give an answer here?  I suppose we can say that it's the choice of rules for 'Left' that matters, because 'Left' precedes 'Right'.  And I guess that's what you meant.

But consider:

    S: '(', a, s?, ')'.
    a: s?, 'a', s?.
    s: #20+.

and the input string "(a )".  Which of the parses is preferred by the first-rule-wins rule?

    <S>(<a>a </a>)</S>
    <S>(<a>a</a> )</S>


Michael

********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
cmsmcq@blackmesatech.com
http://www.blackmesatech.com
********************************************

Received on Wednesday, 14 April 2021 00:42:19 UTC