RE: Comment on SRGS PR 18 December 2003 w.r.t ambiguity from Guillaume Berche on 2004-04-02 (www-voice@w3.org from April to June 2004)

From: Guillaume Berche <guillaume.berche@eloquant.com>
Date: Fri, 2 Apr 2004 09:59:54 +0200
To: "Serge LE HUITOUZE" <slehuitouze@telisma.com>, <www-voice@w3.org>
Message-ID: <ELEGLIHGLLIBFPCIGAKGOEPFDOAA.guillaume.berche@eloquant.com>

Serge,

Oups, I posted my last answer a bit too fast, please ignore it.

You're right, my question is really related to applicability of lookahead
and greedy local optimizations in grammar processors and is indeed
independent of whether the grammar is ambiguous or not (i.e. would lead to
different outputs for a same input on a given rule).

Please find my rephrased question, hoping to get it right now, sorry about
the extra traffic and the confusion :-)

Thanks again for your answer and best regards,

Guillaume.



The SRGS grammar supports disjonction points (such as alternatives and
repeats) which require to look ahead input tokens to determine whether an
input matches a rule or should be rejected. Because such disjonction points
may be nested, some grammars have the potential to require evaluating lots
of branches leading to combinatorial explosion, and may impact grammar
processors performance in severe ways.


I understand lots of grammar languages with real-life applications provide
some
indicators to the grammar processors as to limit the amount of considered
branches to avoid combinatorial explosion, or disallow ambiguity altogether
(see references below). Some provide lookahead hints, other define local
optimization strategy such as greedy behavior at disjonction points.

I wonder the applicability of such mechanisms in the case SRGS as to support
some optimizations in grammar processors.

In particular, having the repeat and alternative expansions be greedy is an
optimization that
usually leads to the expected global output sequence but may
sometimes reject valid input as illustrated in the example below (the greedy
approach would not return the valid outputs and would reject the sample
input)

Expansion:  t0 (t1 {tag1}) <0-2> t1 t2
Input: t0,t1,t1,t2
Output 1: ["t0","t1",{!{tag1}!},"t1","t2" ]
REJECT when evaluating using greedy optimization:
["t0","t1",{!{tag1}!},"t1",{!{tag1}!}]


My refined questions are therefore:
1- has the opportunity for such "hints" or local optimizations been
considered for inclusion in SRGS specification?
2- would a grammar processor applying greedy behavior be conformant given
that section "5.4 Conforming
XML Form Grammar Processors" states the following:

"There is, however, no conformance
requirement with respect to performance characteristics of the XML Form
Grammar Processor. For instance, no statement is required regarding the
accuracy, speed or other characteristics of a speech recognizer or DTMF
detector."




References:
JavaCC Lookahead https://javacc.dev.java.net/doc/lookahead.html
yacc default lookahead
http://www.cs.princeton.edu/~appel/modern/ml/ml-yacc/manual.html#section5
Perl5 regular expressions and greedy quantifiers
http://iis1.cps.unizar.es/Oreilly/perl/prog/ch02_04.htm#PERL2-CH-2-SECT-4.1.
2
ANTLR lookahead
http://www.doc.ic.ac.uk/lab/secondyear/Antlr/metalang.html#_bb21

The XML specifications disallow element structures which are non
determinists in section "E Deterministic Content Models (Non-Normative)",
for which an extract follows:
http://www.w3.org/TR/2004/REC-xml-20040204/#determinism

"As noted in 3.2.1 Element Content, it is required that content models in
element type declarations be deterministic. This requirement is for
compatibility with SGML (which calls deterministic content models
"unambiguous"); XML processors built using SGML systems may flag
non-deterministic content models as errors.

For example, the content model ((b, c) | (b, d)) is non-deterministic,
because given an initial b the XML processor cannot know which b in the
model is being matched without looking ahead to see which element follows
the b. "

Received on Friday, 2 April 2004 03:00:13 UTC