W3C home > Mailing lists > Public > public-ixml@w3.org > April 2021

Re: resolving ambiguity

From: Aleksei Guzev <aleksei@guzev.pro>
Date: Wed, 14 Apr 2021 17:48:31 +0300
Message-ID: <CAObujU9yzojP0ctmLogc0if5_2Pko2CPvE_22OCp+Twv2FTc7Q@mail.gmail.com>
To: Tom Hillman <tom@expertml.com>
Cc: public-ixml@w3.org, "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
If we had the "greedy by default" rule, as regular expression do, then the
first path <S><Left>ab</Left><Right>c</Right></S> wins because 'Left'
consumed as much as it could. While preventing ambiguous grammars does not
seem achievable, the rule could make a grammar more deterministic.

On Wed, 14 Apr 2021 at 14:52, Tom Hillman <tom@expertml.com> wrote:

> In fairness, I suggested it nervously, and only then as part of an
> approach for the input consumption discussion: I would prefer it if we
> leave the choice open.
>
> I think the answer to your first rule wins question is <S>(<a>a </a>)</S>..
> That is possibly because recursing into each non-terminal before handling
> its siblings is second nature to me because it mirrors the document order
> of XML trees ;)
>
> Tom
>
> _________________
> Tomos Hillman
> eXpertML Ltd
> +44 7793 242058
> On 14 Apr 2021, 01:42 +0100, C. M. Sperberg-McQueen <
> cmsmcq@blackmesatech.com>, wrote:
>
> On today’s call, Tom suggested that the spec might usefully say which tree
> should be returned, in case of ambiguity.
>
> I am nervous about this suggestion; I think that the choice of tree is
> better left open.
>
> "First rule wins" seems plausible enough when it applies, but I have not
> thought enough about it to know whether it always applies, whether it
> always provides a unique solution, or how to extend the Earley algorithm
> (or any of the other general algorithms) to keep track of such things in a
> way that would guarantee the selection of the correct tree. Consider this
> grammar:
>
> S: Left, Right.
> Left: "a", "b"; "a".
> Right: "b", "c"; "c".
>
> If I'm not mistaken, this has two parse trees:
>
> <S><Left>ab</Left><Right>c</Right></S>
> <S><Left>a</Left><Right>bc</Right></S>
>
> The first tree uses the first RHS for Left and the second RHS for Right.
> The second tree is the other way round. Does the "first rule wins"
> principle give an answer here? I suppose we can say that it's the choice of
> rules for 'Left' that matters, because 'Left' precedes 'Right'. And I guess
> that's what you meant.
>
> But consider:
>
> S: '(', a, s?, ')'.
> a: s?, 'a', s?.
> s: #20+.
>
> and the input string "(a )". Which of the parses is preferred by the
> first-rule-wins rule?
>
> <S>(<a>a </a>)</S>
> <S>(<a>a</a> )</S>
>
>
> Michael
>
> ********************************************
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> cmsmcq@blackmesatech.com
> http://www.blackmesatech.com
> ********************************************
>
>
>
Received on Wednesday, 14 April 2021 16:26:18 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 14 April 2021 16:26:18 UTC