[emma] Comments on EMMA LCWD by VBWG

VBWG Comments on EMMA LCWD [1]

EMMA provides a good attempt to extend and complete the annotation of
input from different modalities. This is an interesting evolution to be
evaluated in the context of the current recommendation and also for the
advances of Voice Browser standards.

We acknowledge that EMMA LCWD [1] is improved and more complete, but we
suggest a few extensions that are important for us and some general
comments too.

A. EMMA and VoiceXML 2.0/2.1

In the today speech market, VoiceXML 2.0 [2] is the reference
specification with a large presence in the industry. Since Mar. 2004,
VoiceXML 2.0 is a W3C recommendation.
Moreover VoiceXML 2.1 [3] extends VoiceXML 2.0 by adding a restricted
number of new features, none of them has a strict relationship with EMMA
VoiceXML 2.1 is close to become a Proposed Recommendation.

In this context we see very valuable to have an EMMA profile for
VoiceXML 2.0/2.1, whose goal is to enable a quick adoption of EMMA in
the current voice browser industry.
EMMA might be conveyed by the adoption of protocols such as IETF MRCPv2
which gives options to adopt EMMA as the format for speech results.

The VoiceXML 2.0/2.1 states a limited number of annotations to be
collected for each speech or DTMF input (see Table 10 [4] of VoiceXML
2.0 spec). Moreover, multiple ASR results may be returned on application
request, they have to be represented as N-best (see [5] of VoiceXML 2.0
for details).

Here are some practical proposals of possible extensions to EMMA

Proposal 1 - EMMA profile

Describe in EMMA spec a VoiceXML 2.0/2.1 profile, either in an Appendix
or in a Section of the specification.
This profile should describe the mandatory annotations to allow a
complete integration in a VoiceXML 2.0/2.1 compliant browser.

The VoiceXML 2.0/2.1 requires four annotations related to an input. They
are described normatively in [4], as shadow variables related to a form
input item.
The same values are also accessible from the application.lastresult$
variable, see [4].

The annotations are the following: 
- name$.utterance
  which might be conveyed by "emma:token" attribute ([6])
- name$.confidence
  which might be conveyed by "emma:confidence" attribute ([7])
  The range of values seem to be fine: 0.0 - 1.0, but
  some checks could be made in the schema of both the specs.
- name$.inputmode
  which might be conveyed by "emma:mode" attribute ([8])
  Proposal 1.1 for a discussion of its values
- name$.interpretation, is an ECMA script value containing
  the semantic result which has to be derived by the
  content of "emma:interpretation"

As regards the N-best results, see [5] for details, the one-of element
should be suitable to convey them to the voice Browser.

Proposal 1.1 - Values of emma:mode

Some clarification should be needed to explain how to map the values of
"emma:mode" ([8]) to the expected values of the "inputmode" variable
(Table 10 in [4]). The voiceXML 2.0/2.1 prescribes two values: "speech"
and "dtmf".

Anther option is to adopt in EMMA the exact values expected by VoiceXML
2.0/2.1 to simplify the mapping. Other related fine grained EMMA
annotation are not possible in VoiceXML 2.0/2.1.

Proposal 1.2 - Optional/mandatory

The profile should clarify which is mandatory and which is optional. For
instance N-best are an optional feature for VoiceXML 2.0/2.1, while the
other annotations are mandatory.

Proposal 2 - Consider 'noinput' and 'nomatch'

Besides user input from a successful recognition, there are several
other types of results that VoiceXML applications deal with that should
be part of a VoiceXML profile for EMMA as well as the ones suggested in
Proposal 1.

'noinput' and 'nomatch' situations are mandatory for VoiceXML 2.0/2.1.
Since EMMA can also represent these, the EMMA annotations for 'noinput'
and 'nomatch' should be part of the VoiceXML EMMA profile.

Note that in VoiceXML 'nomatch' may carry recognition results as
described in Proposal 1 to be inserted in the application.lastresult$
variable only.

Proposal 3 - DTMF/speech

It is very important that EMMA will be usable for either speech or DTMF
input results, because VoiceXML2.0/2.1 allows both these inputmode
values. We expect that the VoiceXML profile in EMMA will make this clear
to enforce a complete usage of EMMA for Voice Browser applications. 

Proposal 4 - Record results

EMMA can represent the results of a record operation, see the
description of the record element of VoiceXML [9], so the EMMA
annotations for recordings should also be part of a VoiceXML profile.
This feature is optional in VoiceXML 2.0/2.1.

Proposal 5 - Add informative examples

We think that some informative examples will improve the description of
the profile. This might include a SISR grammar for DTMF/speech and the
expected EMMA results to be compliant to the profile.

The examples should include both single result returned and N-best

We think that also an alternative example of lattices would be very
interesting, even if in the VoiceXML 2.0/2.1 it will not be
representable, but nonetheless it will be useful for the evolution of
VoiceXML, see point B below.

B. EMMA and the evolution of VoiceXML

For the evolution of VoiceXML the current time is too premature to give
precise feedback, but the clear intention is to take care of an extended
usage of EMMA inside a future VoiceXML application.

This includes, but it is not limited to:

- leave access to the whole EMMA document inside the
  application.lastresult variable (both a raw EMMA document
  and a processed one, i.e. in ECMA-262 format)
- include proper media-types to allow a clear indication
  if the raw results are expressed in EMMA or other
  formats (e.g. NLSML). The same for the processed results.

Other possible evolutions will be to have a simple way to pass EMMA
results from VoiceXML to other modules to allow further processing.

A last point is that EMMA should be used to return results of Speaker
Identification Verification (SIV) too. Voice Browser SIV subgroup is
working to create a few examples to circulate them to you to get

We will highly appreciate your comments on these ideas to better address
this subject in the context of the evolution of Voice Browser standards.

[1] - http://www.w3.org/TR/2005/WD-emma-20050916/
[2] - http://www.w3.org/TR/2004/REC-voicexml20-20040316/
[3] - http://www.w3.org/TR/2005/CR-voicexml21-20050613/
[4] - http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml2.3.1
[5] - http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml5.1.5
[6] - http://www.w3.org/TR/emma/#s4.2.1
[7] - http://www.w3.org/TR/emma/#s4.2.8
[8] - http://www.w3.org/TR/emma/#s4.2.11
[9] - http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml2.3.6

Gruppo Telecom Italia - Direzione e coordinamento di Telecom Italia S.p.A.

This message and its attachments are addressed solely to the persons above and may contain confidential information. If you have received the message in error, be informed that any use of the content hereof is prohibited. Please return it immediately to the sender and delete the message. Should you have any questions, please send an e_mail to <mailto:webmaster@telecomitalia.it>webmaster@telecomitalia.it. Thank you<http://www.loquendo.com>www.loquendo.com

Received on Monday, 3 April 2006 14:52:15 UTC