Re: How is ambiguity defined?

> On 6,Jan2022, at 3:03 PM, Steven Pemberton <steven.pemberton@cwi.nl> wrote:
> 
> I honestly think you are overthinking this Michael.

My apologies.

To be blunt, I have the impression that the rest of the group is 
either underthinking it or not thinking about it at all.

> All your long treatise goes to show is that there are different theories.

There are different intuitions about how to apply concepts
like derivation and parse in an EBNF context; I would not call
them theories; that gives entirely the wrong impression of 
their status.

If you learned nothing more from my mail than that, then 
I am very sorry to have wasted your time.

> Algorithms based on those theories will therefore produce different results.

Even if what we are looking at were different theories, it would
not follow that algorithms will produce different results, any more
than different correct parsing algorithms produce different parse
trees for the same sentence and the same grammar.

> But we are talking about a tiny corner of any language, and in all cases the serialisation will be the same. We don't even require that a parser discover all possible parses, as long as it finds one, in which case it would never report ambiguity.
> So I still stand by the current wording:
> 
> > > * It must find at least one parse of any input that matches the grammar
> > > * if it finds more than one parse, it must report that fact.

Is that the current wording?  I cannot find it in the spec.

What I find in the spec is rather different:  

> If more than one parse results, one is chosen; it is not defined how this choice is made, but the resulting parse must be marked as ambiguous by including the attribute ixml:state="ambiguous" on the document element of the serialisation.

and 

> If more than one parse tree describes the input, the processor must serialize one of them. It is not defined how this choice is made, but the resulting parse should be marked as ambiguous by including on the document element of the serialisation the attribute ixml:state="ambiguous", unless the user has activated an option to suppress this attribute. 

There are, I think, a few issues here.

First, the two quotations disagree over whether a processor MUST or
SHOULD report ambiguity.

Second, your paraphrase and the two quotations from the spec offer
three different descriptions of when a report of ambiguity is in
order:

  - if the processor finds more than one parse?

  - if more than one parse tree describes the input?
  
  - if more than one parse "results"?  (is that the same as the
    processor finding more than one? or the same as more than one
    existing?)

The difference is important; consider a backtracking parser that
has found one parse tree.  If it is operating in strict conformance
mode, can it return that parse tree?  Which of the following describes
the situation?

  - It has not found more than one parse, so it need not (and should 
    not) report that the sentence is ambiguous? 

  - It does not know whether more than one parse tree describes the
    input, so it does not currently know whether the sentence is
    ambiguous or not.  To be sure, it should look to see whether it
    can find a second parse tree.  If it does, then the sentence is
    ambiguous and the parse tree returned should be so marked.

Finally, both in the current spec and in your paraphrase, the terms
“parse” and “parse tree” are undefined, and so there is really no way
to be sure what counts, for purposes of the spec, as “more than one
parse” or “more than one parse tree”.

The references to “parse trees” elsewhere in the spec suggest that it
refers to the XML structure output by a conforming ixml processor:


> A grammar is used to describe the input format. An input is parsed using this grammar, and the resulting parse tree is serialised as XML. 

Processors must

> parse the input using the grammar specified, and produce an XML document representing a parse tree for the input

If the parse trees referred to in the rules relating to ixml:state are the 
XML documents to be returned, then there no question about it:  
the empty string has only one XML representation in our example, and 
the correct tree to return is <S/>, not <S ixml:state=“ambiguous”/>

But I believe that several people have expressed an unwillingness to
apply the “more than one” to the XML form being produced, and 
there are good reasons to be cautious.

I don’t see a compelling reason for us not to have a clear story
here, even though it may require that we think hard for a bit.

Michael

Received on Friday, 7 January 2022 00:03:23 UTC