- From: Aleksei Guzev <aleksei@guzev.pro>
- Date: Wed, 14 Apr 2021 17:48:31 +0300
- To: Tom Hillman <tom@expertml.com>
- Cc: public-ixml@w3.org, "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Message-ID: <CAObujU9yzojP0ctmLogc0if5_2Pko2CPvE_22OCp+Twv2FTc7Q@mail.gmail.com>
If we had the "greedy by default" rule, as regular expression do, then the first path <S><Left>ab</Left><Right>c</Right></S> wins because 'Left' consumed as much as it could. While preventing ambiguous grammars does not seem achievable, the rule could make a grammar more deterministic. On Wed, 14 Apr 2021 at 14:52, Tom Hillman <tom@expertml.com> wrote: > In fairness, I suggested it nervously, and only then as part of an > approach for the input consumption discussion: I would prefer it if we > leave the choice open. > > I think the answer to your first rule wins question is <S>(<a>a </a>)</S>.. > That is possibly because recursing into each non-terminal before handling > its siblings is second nature to me because it mirrors the document order > of XML trees ;) > > Tom > > _________________ > Tomos Hillman > eXpertML Ltd > +44 7793 242058 > On 14 Apr 2021, 01:42 +0100, C. M. Sperberg-McQueen < > cmsmcq@blackmesatech.com>, wrote: > > On today’s call, Tom suggested that the spec might usefully say which tree > should be returned, in case of ambiguity. > > I am nervous about this suggestion; I think that the choice of tree is > better left open. > > "First rule wins" seems plausible enough when it applies, but I have not > thought enough about it to know whether it always applies, whether it > always provides a unique solution, or how to extend the Earley algorithm > (or any of the other general algorithms) to keep track of such things in a > way that would guarantee the selection of the correct tree. Consider this > grammar: > > S: Left, Right. > Left: "a", "b"; "a". > Right: "b", "c"; "c". > > If I'm not mistaken, this has two parse trees: > > <S><Left>ab</Left><Right>c</Right></S> > <S><Left>a</Left><Right>bc</Right></S> > > The first tree uses the first RHS for Left and the second RHS for Right. > The second tree is the other way round. Does the "first rule wins" > principle give an answer here? I suppose we can say that it's the choice of > rules for 'Left' that matters, because 'Left' precedes 'Right'. And I guess > that's what you meant. > > But consider: > > S: '(', a, s?, ')'. > a: s?, 'a', s?. > s: #20+. > > and the input string "(a )". Which of the parses is preferred by the > first-rule-wins rule? > > <S>(<a>a </a>)</S> > <S>(<a>a</a> )</S> > > > Michael > > ******************************************** > C. M. Sperberg-McQueen > Black Mesa Technologies LLC > cmsmcq@blackmesatech.com > http://www.blackmesatech.com > ******************************************** > > >
Received on Wednesday, 14 April 2021 16:26:18 UTC