W3C home > Mailing lists > Public > www-voice@w3.org > April to June 2004

Comment on SRGS PR 18 December 2003 w.r.t ambiguity

From: Guillaume Berche <guillaume.berche@eloquant.com>
Date: Thu, 1 Apr 2004 09:32:57 +0200
To: <www-voice@w3.org>
Message-ID: <ELEGLIHGLLIBFPCIGAKGEEOODOAA.guillaume.berche@eloquant.com>

Hello,

I would like some precisions on the SRGS Proposed Recommendation from 18
December 2003 concerning the behavior of a conformant processor with respect
to grammar ambiguity. I could not find an answer in the mailing archive nor
in the IR test suite. Please point me to the appropriate section if I missed
it, or otherwise please register a change request on this point to clarify
this aspect of the specification.

Thanks for your help and best regards,

Guillaume.
----------------------------------------------------------
Guillaume Berche
Eloquant, ZA Le malvaisin
38240 Le Versoud, France
guillaume.berche@eloquant.com
----------------------------------------------------------



I understand the SRGS specification is not clear about how ambiguities in
points of disjunction are handled. While section "H.1 Terminology and
Notation" does include an "Ambiguity" paragraph with the content reproduced
below, it does not explicitely require that a grammar processor to select
**at least one** of the alternatives in case of ambiguity.

"An ambiguity occurs when for a specified sequence of input tokens matched
to a specified rule of a grammar there is more than one distinct logical
parse structure that can be produced.

An ambiguity can occur at points of disjunction (choice) in a grammar.
Disjunction exists with the use of alternatives and repeats.

A grammar processor may preserve any number of ambiguous logical parse
structures to create a set of alternative logical parse structures for the
input. It is legal for a grammar processor to maintain all possible logical
parse structures or to dispose of all but one of the alternatives. There is
no specified behavior for selection of ambiguities amongst possibilities by
a grammar processors. As a result grammars that contain ambiguity do not
guarantee portability of performance. Developers and grammar tools should be
ambiguity-aware."




In the following example with an ambiguous grammar, only output 1 would lead
to a match, the second incomplete output leading to a rejection (no match).
Is a conformant grammar processor required to return output1 or would
output2 be also a conformant behavior, given that section "5.4 Conforming
XML Form Grammar Processors" states that "There is, however, no conformance
requirement with respect to performance characteristics of the XML Form
Grammar Processor. For instance, no statement is required regarding the
accuracy, speed or other characteristics of a speech recognizer or DTMF
detector."


Expansion:  t0 (t1 {tag1}) <0-2> t1 t2
Input: t0,t1,t1,t2
Output 1: ["t0",{!{tag1}!},"t1","t1","t2" ]
Output 2: [REJECT after evaluating: "t0",{!{tag1}!},"t1",{!{tag1}!},"t1"]


If output1 is required to be returned by all conformant grammar processors,
then this may prevent important optimizations in the grammar processor
implementations, without real benefits for grammar authors since in presence
of other ambiguous grammars the output is not predicatable. Example above is
quite simple, much more complex cases may occur which requires the grammar
processor to preserve a very large amount of logical parse output and may
severely impact performance.

Lots of grammar languages with real-life applications provide some
indicators to the grammar processor as to limit the amount of considered
branch and avoid combinatorial explosion (see references below), or disallow
ambiguity altogether. Some provide lookahead hints, other define local
optimization strategy such as greedy behavior at disjonction points.


I believe a reasonneable approach to solve this problem could be to modify
section "H.1 Terminology and Notation" paragraph "Ambiguity" to precise that
it is legal for a grammar processor to be "greedy" in matching each
expansion (i.e. select the alternative which matches most available tokens).
This behavior usually leads to expected global output sequence but may
sometimes reject valid input as illustrated in the example above.



References:
JavaCC Lookahead https://javacc.dev.java.net/doc/lookahead.html
yacc default lookahead
http://www.cs.princeton.edu/~appel/modern/ml/ml-yacc/manual.html#section5
Perl5 regular expressions and greedy quantifiers
http://iis1.cps.unizar.es/Oreilly/perl/prog/ch02_04.htm#PERL2-CH-2-SECT-4.1.
2
ANTLR lookahead
http://www.doc.ic.ac.uk/lab/secondyear/Antlr/metalang.html#_bb21
XML languages and ambiguity
http://www.oasis-open.org/cover/topics.html#ambiguity
Received on Thursday, 1 April 2004 02:33:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 October 2006 12:48:59 GMT