Re: Ambiguity and the interchangeability of parses

My implementation also says how many upper-level ambiguities they are, and where they happen.


For instance:


time flies like an arrow.


<!-- AMBIGUOUS
     The input from line.pos 1.1 to 1.25 can be interpreted as 'sentence' in 3 different ways:
     1: sentence[1.1:]:  command[:1.25] 
     2: sentence[1.1:]:  subject[:1.5] s[:1.6] verb[:1.11] (s, comparison)[:1.25] 
     3: sentence[1.1:]:  subject[:1.11] s[:1.12] verb[:1.16] s[:1.17] object[:1.25] 
-->


<sentence ixml:state="ambiguous" xmlns:ixml="http://invisiblexml.org/NS">
   <command>
      <verb>time</verb>
      <object>
         <noun>flies</noun>
      </object>
      <comparison>like
         <noun>an arrow</noun>
      </comparison>
   </command>
</sentence>



On Friday 25 March 2022 11:29:43 (+01:00), Steven Pemberton wrote:


You're covered. See the last part of this:


If more than one parse tree describes the input, the processor must serialize one of them. It is not defined how this choice is made, but the resulting parse must by default be marked as ambiguous by including on the document element of the serialization the attribute ixml:state="ambiguous". Processors may provide a user option to suppress that attribute; they may also provide a user option to produce more than one parse tree.
The point is that the parses may not be equivalent, but there's no efficient way to tell, and there is no meaningful way to specify how one of the parses should be chosen. So we just require one to be produced, with an indication that it is ambiguous, so the user at least has a warning.


Steven




On Thursday 24 March 2022 17:39:52 (+01:00), Bethan Tovey-Walsh wrote:


Hello, all,


There was some discussion in Tuesday's meeting of ambiguous parses and whether it was useful to report them. In particular, I think someone suggested that it didn’t matter which parse the user receives in the case that there’s more than one valid parse. I don’t think I did a good job of explaining why I think that’s wrong, and I want to put my view about this on record.


Take the following string: 


"Father helps mother nurse shark bite patient."


And this grammar:


sentence: subject, verb_phrase, -'.' .

modifier: noun; adjective .

noun_phrase: modifier*, noun .

complement: non_finite_clause .

non_finite_clause: nf_verb, object .

object: noun_phrase .

subject: noun_phrase .

verb_phrase: f_verb, object, complement .

noun: ("Father"; "mother"; "shark"; "bite"; "patient"), -' '? .

adjective: "nurse", -' '? .

f_verb: "helps", -' '? .

nf_verb: ("nurse"; "bite"), -' '? .


If you parse the string with the grammar, the processor will find two parses:


<sentence xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
  <subject>
     <noun_phrase>
        <noun>Father</noun>
     </noun_phrase>
  </subject>
  <verb_phrase>
     <f_verb>helps</f_verb>
     <object>
        <noun_phrase>
           <modifier>
              <noun>mother</noun>
           </modifier>
           <modifier>
              <adjective>nurse</adjective>
           </modifier>
           <noun>shark</noun>
        </noun_phrase>
     </object>
     <complement>
        <non_finite_clause>
           <nf_verb>bite</nf_verb>
           <object>
              <noun_phrase>
                 <noun>patient</noun>
              </noun_phrase>
           </object>
        </non_finite_clause>
     </complement>
  </verb_phrase>
</sentence>


and 


<sentence xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
  <subject>
     <noun_phrase>
        <noun>Father</noun>
     </noun_phrase>
  </subject>
  <verb_phrase>
     <f_verb>helps</f_verb>
     <object>
        <noun_phrase>
           <noun>mother</noun>
        </noun_phrase>
     </object>
     <complement>
        <non_finite_clause>
           <nf_verb>nurse</nf_verb>
           <object>
              <noun_phrase>
                 <modifier>
                    <noun>shark</noun>
                 </modifier>
                 <modifier>
                    <noun>bite</noun>
                 </modifier>
                 <noun>patient</noun>
              </noun_phrase>
           </object>
        </non_finite_clause>
     </complement>
  </verb_phrase>
</sentence>



The first of these represents the structure of a sentence with three actors: a father, a mother nurse-shark, and a patient. The father helps the mother nurse-shark to bite the patient.


The second represents a sentence with three actors: a father, a mother, and a shark-bite patient. The father helps the mother to nurse the shark-bite patient.


Both of these parses are valid, and both give structure to the content of the string. The two different structures represent the semantics of the string differently, and are not equivalent or interchangeable representations of the structure of the string.


Parsing linguistic examples which have structural ambiguities is one very good use-case in which I might want to see every possible parse, and might not consider the different parses to be interchangeable. Importantly, my grammar is ambiguous on purpose, because natural language is ambiguous in this way. The fact that my grammar is ambiguous therefore doesn’t represent a hygiene issue. The ambiguity is the focus, not a side-effect.


In any case, I just wanted to note this as some background to explain why I think that the ability to report ambiguity, and to return all possible parses, may be very important. It also, I hope, explains why I’m very opposed to the idea that the parses returned by an ambiguous grammar are always essentially equivalent to each other.


Very best,
Bethan
___________________________________________________ 
Dr. Bethan Tovey-Walsh 
Myfyrwraig PhD | PhD Student CorCenCC 
Prifysgol Abertawe | Swansea University 
Croeso i chi ysgrifennu ataf yn y Gymraeg.

Received on Friday, 25 March 2022 10:35:33 UTC