W3C Last Call comments: Speech Grammar Specification

As requested, here are comments on the "Speech Recognition Grammar
Specification" Last Call Working Draft.

<request-list>
<request id="TM001">
<title>Clarify content of tags</title>
<description>
Proposal states that tag contents are "arbitrary" yet all examples use
{action=XYZ;} format. Does this indicate that key/value pairs are expected?
Required? if not, then varied examples should be given.
</description>
<submission-date>2000-10-28</submission-date>
<submitted-by>Tellme</submitted-by>
<note></note>
<priority>medium</priority>
<status>unassigned</status>
</request>

<request id="TM002">
<title>Clarify what of tag or grammar is returned to application</title>
<description>
What is returned to field, raw text from recognizer, contents of tag, both,
neither? How can one format the contents to be returned in a useful way, for
instance, removing spaces from a series of digits?
</description>
<submission-date>2000-01-28</submission-date>
<submitted-by>Tellme Networks</submitted-by>
<note></note>
<priority>high</priority>
<status>unassigned</status>
</request>


<request id="TM003">
<title>Clarify nesting of tags</title>
<description>
If developer uses nested grammars, how if at all are contents of tags in
subgrammars bubbled up? Does tag implicitly concatenate contents of tags in
grammars it references, or can developer reference contents of tag in a
subgrammar inside the tag content? For example, if grammar is "i want to go
$somewhere" and $somewhere is defined as "home | to work | out on the town",
how can one bubble up the results of $somewhere? one possibility is "i want
to go $somewhere {location=$somewhere}" and $somewhere defined as "home
{home} | to work {work} | out on the town {party}". Not clear if one can
build up the contents of tag by referencing subgrammars, however.
</description>
<submission-date>2001-01-28</submission-date>
<submitted-by>Tellme Networks</submitted-by>
<note></note>
<priority>medium</priority>
<status>unassigned</status>
</request>

<request id="TM004">
<title>Need tag mechanism for returning confidence scores</title>
<description>
Knowing the recognizer's confidence in its hypothesis is critical for making
user-experience decisions. The tag should support returning the confidence
score for that hypothesis.
</description>
<submission-date>2001-01-28</submission-date>
<submitted-by>Tellme Networks</submitted-by>
<note></note>
<priority>high</priority>
<status>unassigned</status>
</request>

<request id="TM005">
<title>Strong recommendation for phonetic representation in grammar
format</title>
<description>
Section 6.5 states that that allowing phonetic pronunciations in the grammar
spec is under consideration. We consider this capability absolutely
critical, especially given that pronunciation lookup appears un- or
under-specified.
</description>
<submission-date>2001-01-28</submission-date>
<submitted-by>Tellme Networks</submitted-by>
<note></note>
<priority>high</priority>
<status>unassigned</status>
</request>

<request id="TM006">
<title>Add specification for pronunciation lookup rules.</title>
<description>
Proper modeling of word and phrase pronunciations is essential to
performance. While the details of how pronunciations are represented
internally is of course left to the implementer, specifications on how
various words are handled should be added. Specific cases to be addressed
include 1. case-sensitivity 2. digits (is '101' one-oh-one or
a-hundred-'n-one?) 3. non-alphanumeric characters (treated as word
boundaries?).
</description>
<submission-date>2001-01-28</submission-date>
<submitted-by>Tellme Networks</submitted-by>
<note></note>
<priority>medium</priority>
<status>unassigned</status>
</request>

<request id="TM007">
<title>Allow variables in weightings</title>
<description>
Weighting grammar items differently given pragmatic context can greatly
improve performance. Grammar format should allow use of variables, not just
constants, in the specification of weightings.
</description>
<submission-date>2001-01-28</submission-date>
<submitted-by>Tellme Networks</submitted-by>
<note></note>
<priority>medium</priority>
<status>unassigned</status>
</request>

<request id="TM008">
<title>Allow combined speech and DTMF grammars, not just multiple
flavors</title>
<description>
Various comments on various specs have suggested allowing a DTMF grammar
type as well as a speech grammar type. We agree that this is important, but
further, a specification is needed that allows DTMF equivalents to be
defined "inline" with a speech grammar, not necessarily separated out. For
simple tasks, having to write two separate grammars complicates the
development process and forces the developer to keep the two in sync, rather
than simply combining them. For example, it is convenient to be able to
write a sports grammar as "(basketball | n b a | dtmf-1) {sport=basketball}
| (football | n f l | dtmf-2) {sport=football} | ([ice] hockey | n h l |
dtmf-3) {sport=hockey}".
</description>
<submission-date>2001-01-28</submission-date>
<submitted-by>Tellme Networks</submitted-by>
<note></note>
<priority>medium</priority>
<status>unassigned</status>
</request>

<request id="TM009">
<title>Scope of positive-closure in ABNF may be confusing</title>
<description>
The ABNF syntax for positive-closure is following an element with '+'. While
the spec is unambiguous as written, developers may be confused whether the
grammar "$digitonetype + $digitanothertype" means for the first element to
repeat or the second element to repeat. At least one other popular BNF
syntax does this in the opposite fashion from the ABNF spec here. Requiring
parentheses around the closure operator, while slightly laborious, would
clarify. E.g., "($digitonetype +) $digitanothertype".
</description>
<submission-date>2001-01-28</submission-date>
<submitted-by>Tellme Networks</submitted-by>
<note></note>
<priority>very low</priority>
<status>unassigned</status>
</request>
</request-list>

Received on Wednesday, 31 January 2001 21:50:11 UTC