- From: Guillaume Berche <guillaume.berche@eloquant.com>
- Date: Fri, 2 Apr 2004 08:15:11 +0200
- To: "Serge LE HUITOUZE" <slehuitouze@telisma.com>, <www-voice@w3.org>
Serge, Thanks for your answer and for pointing out my example was incorrect, sorry about that. Please find a corrected version on my question below which I hope better illustrates my question concerning use of greedy matching approach. Thanks and best regards, Guillaume. I understand the SRGS specification is not clear about how ambiguities in points of disjunction are handled. While section "H.1 Terminology and Notation" does include an "Ambiguity" paragraph with the content reproduced below, it does not explicitely require that a grammar processor to select **at least one** of the alternatives in case of ambiguity. "An ambiguity occurs when for a specified sequence of input tokens matched to a specified rule of a grammar there is more than one distinct logical parse structure that can be produced. An ambiguity can occur at points of disjunction (choice) in a grammar. Disjunction exists with the use of alternatives and repeats. A grammar processor may preserve any number of ambiguous logical parse structures to create a set of alternative logical parse structures for the input. It is legal for a grammar processor to maintain all possible logical parse structures or to dispose of all but one of the alternatives. There is no specified behavior for selection of ambiguities amongst possibilities by a grammar processors. As a result grammars that contain ambiguity do not guarantee portability of performance. Developers and grammar tools should be ambiguity-aware." In the following example with an ambiguous grammar, output 1 and output 2 are valid outputs for the same input. Is a conformant grammar processor required to return at least one valid output or rejecting this input be also a conformant behavior, given that section "5.4 Conforming XML Form Grammar Processors" states that "There is, however, no conformance requirement with respect to performance characteristics of the XML Form Grammar Processor. For instance, no statement is required regarding the accuracy, speed or other characteristics of a speech recognizer or DTMF detector." Expansion: t0 (t1 {tag1}) <0-2> (t1) <1-2> t2 Input: t0,t1,t1,t2 Output 1: ["t0","t1","t1","t2" ] Output 2: ["t0","t1",{!{tag1}!},"t1","t2" ] Sample output before REJECT when applying greedy matching: ["t0","t1",{!{tag1}!},"t1",{!{tag1}!}] In this example, it an output is required to be returned by all conformant grammar processors, then this may prevent important optimizations in the grammar processor implementations, without real benefits for grammar authors since in presence of other ambiguous grammars the output is not predicatable anyway. Example above is quite simple, but much more complex cases may occur which requires the grammar processor to preserve a very large amount of logical parse output and may severely impact performance. I understand lots of grammar languages with real-life applications provide some indicators to the grammar processor as to limit the amount of considered branches to avoid combinatorial explosion (see references below), or disallow ambiguity altogether. Some provide lookahead hints, other define local optimization strategy such as greedy behavior at disjonction points. I believe a reasonneable approach to solve this problem could be to modify section "H.1 Terminology and Notation" paragraph "Ambiguity" to precise that it is legal for a grammar processor to be "greedy" in matching each expansion (i.e. select the alternative which matches most available tokens). This behavior usually leads to expected global output sequence but may sometimes reject valid input as illustrated in the example above (the greedy approach would not return any of the two outputs and would reject the sample output) References: JavaCC Lookahead https://javacc.dev.java.net/doc/lookahead.html yacc default lookahead http://www.cs.princeton.edu/~appel/modern/ml/ml-yacc/manual.html#section5 Perl5 regular expressions and greedy quantifiers http://iis1.cps.unizar.es/Oreilly/perl/prog/ch02_04.htm#PERL2-CH-2-SECT-4.1. 2 ANTLR lookahead http://www.doc.ic.ac.uk/lab/secondyear/Antlr/metalang.html#_bb21 XML languages and ambiguity http://www.oasis-open.org/cover/topics.html#ambiguity > -----Original Message----- > From: www-voice-request@w3.org [mailto:www-voice-request@w3.org]On > Behalf Of Serge LE HUITOUZE > Sent: jeudi 1 avril 2004 10:52 > To: www-voice@w3.org > Subject: RE: Comment on SRGS PR 18 December 2003 w.r.t ambiguity > > > > Guillaume Berche wrote: > > In the following example with an ambiguous grammar, [...] > > > > Expansion: t0 (t1 {tag1}) <0-2> t1 t2 > > Input: t0,t1,t1,t2 > > Output 1: ["t0",{!{tag1}!},"t1","t1","t2" ] > > Output 2: [REJECT after evaluating: > "t0",{!{tag1}!},"t1",{!{tag1}!},"t1"] > > First (cosmetic) remark: Output 1 is not the expected output, according > to Appendix H of SRGS Proposed Recommendation from 18 December 2003. > It should be: ["t0","t1",{!{tag1}!},"t1","t2" ] > > Second, more important, remark: This grammar is *not* ambiguous. > The only three inputs that are recognized by this grammar are: > 1. t0, t1, t2 > 2. t0, t1, t1, t2 > 3. t0, t1, t1, t1, t2 > > Each of them are *unambiguously* parsed, giving respective unique outputs: > 1. ["t0","t1","t2" ] > 2. ["t0","t1",{!{tag1}!},"t1","t2" ] > 3. ["t0","t1",{!{tag1}!},"t1",{!{tag1}!},"t1","t2" ] > > > Best regards. > > --Serge Le Huitouze > > >
Received on Friday, 2 April 2004 01:15:18 UTC