RE: Comment on SRGS PR 18 December 2003 w.r.t ambiguity from Guillaume Berche on 2004-04-02 (www-voice@w3.org from April to June 2004)

From: Guillaume Berche <guillaume.berche@eloquant.com>
Date: Fri, 2 Apr 2004 08:15:11 +0200
To: "Serge LE HUITOUZE" <slehuitouze@telisma.com>, <www-voice@w3.org>
Message-ID: <ELEGLIHGLLIBFPCIGAKGMEPADOAA.guillaume.berche@eloquant.com>
Serge,

Thanks for your answer and for pointing out my example was incorrect, sorry
about that. Please find a corrected version on my question below which I
hope better illustrates my question concerning use of greedy matching
approach.

Thanks and best regards,

Guillaume.


I understand the SRGS specification is not clear about how ambiguities in
points of disjunction are handled. While section "H.1 Terminology and
Notation" does include an "Ambiguity" paragraph with the content reproduced
below, it does not explicitely require that a grammar processor to select
**at least one** of the alternatives in case of ambiguity.

"An ambiguity occurs when for a specified sequence of input tokens matched
to a specified rule of a grammar there is more than one distinct logical
parse structure that can be produced.

An ambiguity can occur at points of disjunction (choice) in a grammar.
Disjunction exists with the use of alternatives and repeats.

A grammar processor may preserve any number of ambiguous logical parse
structures to create a set of alternative logical parse structures for the
input. It is legal for a grammar processor to maintain all possible logical
parse structures or to dispose of all but one of the alternatives. There is
no specified behavior for selection of ambiguities amongst possibilities by
a grammar processors. As a result grammars that contain ambiguity do not
guarantee portability of performance. Developers and grammar tools should be
ambiguity-aware."




In the following example with an ambiguous grammar, output 1 and output 2
are
valid outputs for the same input. Is a conformant grammar processor required
to
return at least one valid output or rejecting this input be also a
conformant
behavior, given that section "5.4 Conforming XML Form Grammar Processors"
states that "There is, however, no conformance
requirement with respect to performance characteristics of the XML Form
Grammar Processor. For instance, no statement is required regarding the
accuracy, speed or other characteristics of a speech recognizer or DTMF
detector."


Expansion:  t0 (t1 {tag1}) <0-2> (t1) <1-2> t2
Input: t0,t1,t1,t2
Output 1: ["t0","t1","t1","t2" ]
Output 2: ["t0","t1",{!{tag1}!},"t1","t2" ]
Sample output before REJECT when applying greedy matching:
["t0","t1",{!{tag1}!},"t1",{!{tag1}!}]


In this example, it an output is required to be returned by all conformant
grammar processors,
then this may prevent important optimizations in the grammar processor
implementations, without real benefits for grammar authors since in presence
of other ambiguous grammars the output is not predicatable anyway. Example
above is
quite simple, but much more complex cases may occur which requires the
grammar
processor to preserve a very large amount of logical parse output and may
severely impact performance.

I understand lots of grammar languages with real-life applications provide
some
indicators to the grammar processor as to limit the amount of considered
branches to avoid combinatorial explosion (see references below), or
disallow
ambiguity altogether. Some provide lookahead hints, other define local
optimization strategy such as greedy behavior at disjonction points.


I believe a reasonneable approach to solve this problem could be to modify
section "H.1 Terminology and Notation" paragraph "Ambiguity" to precise that
it is legal for a grammar processor to be "greedy" in matching each
expansion (i.e. select the alternative which matches most available tokens).
This behavior usually leads to expected global output sequence but may
sometimes reject valid input as illustrated in the example above (the greedy
approach would not return any of the two outputs and would reject the sample
output)



References:
JavaCC Lookahead https://javacc.dev.java.net/doc/lookahead.html
yacc default lookahead
http://www.cs.princeton.edu/~appel/modern/ml/ml-yacc/manual.html#section5
Perl5 regular expressions and greedy quantifiers
http://iis1.cps.unizar.es/Oreilly/perl/prog/ch02_04.htm#PERL2-CH-2-SECT-4.1.
2
ANTLR lookahead
http://www.doc.ic.ac.uk/lab/secondyear/Antlr/metalang.html#_bb21
XML languages and ambiguity
http://www.oasis-open.org/cover/topics.html#ambiguity






> -----Original Message-----
> From: www-voice-request@w3.org [mailto:www-voice-request@w3.org]On
> Behalf Of Serge LE HUITOUZE
> Sent: jeudi 1 avril 2004 10:52
> To: www-voice@w3.org
> Subject: RE: Comment on SRGS PR 18 December 2003 w.r.t ambiguity
>
>
>
> Guillaume Berche wrote:
> > In the following example with an ambiguous grammar, [...]
> >
> > Expansion:  t0 (t1 {tag1}) <0-2> t1 t2
> > Input: t0,t1,t1,t2
> > Output 1: ["t0",{!{tag1}!},"t1","t1","t2" ]
> > Output 2: [REJECT after evaluating:
> "t0",{!{tag1}!},"t1",{!{tag1}!},"t1"]
>
> First (cosmetic) remark: Output 1 is not the expected output, according
> to Appendix H of SRGS Proposed Recommendation from 18 December 2003.
> It should be: ["t0","t1",{!{tag1}!},"t1","t2" ]
>
> Second, more important, remark: This grammar is *not* ambiguous.
> The only three inputs that are recognized by this grammar are:
> 1. t0, t1, t2
> 2. t0, t1, t1, t2
> 3. t0, t1, t1, t1, t2
>
> Each of them are *unambiguously* parsed, giving respective unique outputs:
> 1. ["t0","t1","t2" ]
> 2. ["t0","t1",{!{tag1}!},"t1","t2" ]
> 3. ["t0","t1",{!{tag1}!},"t1",{!{tag1}!},"t1","t2" ]
>
>
> Best regards.
>
> --Serge Le Huitouze
>
>
>
Received on Friday, 2 April 2004 01:15:18 UTC