- From: Guillaume Berche <guillaume.berche@eloquant.com>
- Date: Fri, 2 Apr 2004 09:59:54 +0200
- To: "Serge LE HUITOUZE" <slehuitouze@telisma.com>, <www-voice@w3.org>
Serge, Oups, I posted my last answer a bit too fast, please ignore it. You're right, my question is really related to applicability of lookahead and greedy local optimizations in grammar processors and is indeed independent of whether the grammar is ambiguous or not (i.e. would lead to different outputs for a same input on a given rule). Please find my rephrased question, hoping to get it right now, sorry about the extra traffic and the confusion :-) Thanks again for your answer and best regards, Guillaume. The SRGS grammar supports disjonction points (such as alternatives and repeats) which require to look ahead input tokens to determine whether an input matches a rule or should be rejected. Because such disjonction points may be nested, some grammars have the potential to require evaluating lots of branches leading to combinatorial explosion, and may impact grammar processors performance in severe ways. I understand lots of grammar languages with real-life applications provide some indicators to the grammar processors as to limit the amount of considered branches to avoid combinatorial explosion, or disallow ambiguity altogether (see references below). Some provide lookahead hints, other define local optimization strategy such as greedy behavior at disjonction points. I wonder the applicability of such mechanisms in the case SRGS as to support some optimizations in grammar processors. In particular, having the repeat and alternative expansions be greedy is an optimization that usually leads to the expected global output sequence but may sometimes reject valid input as illustrated in the example below (the greedy approach would not return the valid outputs and would reject the sample input) Expansion: t0 (t1 {tag1}) <0-2> t1 t2 Input: t0,t1,t1,t2 Output 1: ["t0","t1",{!{tag1}!},"t1","t2" ] REJECT when evaluating using greedy optimization: ["t0","t1",{!{tag1}!},"t1",{!{tag1}!}] My refined questions are therefore: 1- has the opportunity for such "hints" or local optimizations been considered for inclusion in SRGS specification? 2- would a grammar processor applying greedy behavior be conformant given that section "5.4 Conforming XML Form Grammar Processors" states the following: "There is, however, no conformance requirement with respect to performance characteristics of the XML Form Grammar Processor. For instance, no statement is required regarding the accuracy, speed or other characteristics of a speech recognizer or DTMF detector." References: JavaCC Lookahead https://javacc.dev.java.net/doc/lookahead.html yacc default lookahead http://www.cs.princeton.edu/~appel/modern/ml/ml-yacc/manual.html#section5 Perl5 regular expressions and greedy quantifiers http://iis1.cps.unizar.es/Oreilly/perl/prog/ch02_04.htm#PERL2-CH-2-SECT-4.1. 2 ANTLR lookahead http://www.doc.ic.ac.uk/lab/secondyear/Antlr/metalang.html#_bb21 The XML specifications disallow element structures which are non determinists in section "E Deterministic Content Models (Non-Normative)", for which an extract follows: http://www.w3.org/TR/2004/REC-xml-20040204/#determinism "As noted in 3.2.1 Element Content, it is required that content models in element type declarations be deterministic. This requirement is for compatibility with SGML (which calls deterministic content models "unambiguous"); XML processors built using SGML systems may flag non-deterministic content models as errors. For example, the content model ((b, c) | (b, d)) is non-deterministic, because given an initial b the XML processor cannot know which b in the model is being matched without looking ahead to see which element follows the b. "
Received on Friday, 2 April 2004 03:00:13 UTC