Re: Feedback to Last Call Working Draft of EMMA (Extensible Multimodal Annotation) specification from Michael Johnston on 2006-10-03 (www-multimodal@w3.org from October 2006)

From: Michael Johnston <johnston@research.att.com>
Date: Tue, 03 Oct 2006 13:49:37 -0400
To: www-multimodal@w3.org
Cc: <'johnston@research.att.com'>, <dahl@conversational-technologies.com>, <romanell@dfki.de>, <norbert.reithinger@dfki.de>, "Liu, Jin" <Jin.Liu@t-systems.com>
Message-Id: <5.2.0.9.0.20061003133452.01824400@unixmail.research.att.com>
Dear SmartWeb consortium,

Many thanks for your feedback on the EMMA
specification. The W3C Multimodal working group
have reviewed the comments in some detail
and this has resulted in changes to
the current draft of the EMMA specification
and will inform our work in future versions of
EMMA as well as the architecture and
authoring efforts ongoing within the group.
Our formal responses are detailed below.

Thanks again for your detailed feedback.

best
Michael Johnston (at&t, Chair, EMMA subgroup)





RESPONSE TO FEEDBACK FROM SMARTWEB CONSORTIUM:
======================================================================

1. USING EMMA FOR OUTPUT ALSO:
======================================================================

Suggest use of EMMA to represent output using emma:result element.

RESPONSE:

The current scope of the EMMA specification is to provide
a framework for representing and annotating user inputs.
There are considerably more issues to address and work
needed to give an adequate representation of user output
and so for the current specification document the multimodal
working group have chosen to defer work on output. For
example, how would graphical output be handled, if the
system is going to draw ink, display a table, or zoom a map?
There has been interest in output representation both inside
and outside the working group. In a future version of EMMA we
may consider this topic, and would at that time return to
your contribution and others we have received.


2. USING EMMA FOR STATUS COMMUNICATION AMONG COMPONENTS
======================================================================

PROPOSAL TO ADD EMMA ANNOTATIONS FOR STATUS COMMUNICATION
AMONG COMPONENTS:

         emma:status
         emma:actual-answer-time
         emma:expected-answer-time
         emma:query-running

RESPONSE:

The scope of EMMA is to provide an representation and annotation
mechanism for user inputs to spoken and multimodal systems. As
such status communication messages among processing components
fall outside the scope of EMMA and are better addressed as part of the
MMI architecture outside of EMMA. We are forwarding this feedback to
the architecture and authoring subgroups within the W3C Multimodal
working group. This contribution is of particular interest to the
authoring effort.



3. OOV
=======================================================================

PROPOSAL TO ADD EMMA:OOV MARKUP FOR INDICATING PROPERTIES OF
OUT OF VOCABULARY ITEMS:

         emma:oov

         <emma:arc emma:from="6" emma:to="7"
                 emma:start="1113501463034"
                emma:end="1113501463934"
                emma:confidence="0.72">
         <emma:one-of id="MMR-1-1-OOV"
           emma:start="1113501463034" emma:end="1113501463934">
                         <emma:oov emma:class="OOV-Celestial-Body"
                                 emma:phoneme="stez"
                                 emma:grapheme="sters"
                                 emma:confidence="0.74"/>
                       <emma:oov emma:class="OOV-Celestial-Body"
                                 emma:phoneme="stO:z"
                                 emma:grapheme="staurs"
                                 emma:confidence="0.77"/>
                       <emma:oov emma:class="OOV-Celestial-Body"
                                 emma:phoneme="stA:z"
                                  emma:grapheme="stars"
                                 emma:confidence="0.81"/>
                             </emma:one-of>
                 </emma:arc>


RESPONSE:

While the ability to specify recognize and annotate the
presence of out of vocabulary items appears extremely
valuable, the EMMA group are concerned as to how many
recognizers will in fact provide this capability. Furthermore
to develop this proposal fully significant time will have to
be assigned.  Therefore we believe that the proposed
annotation for oov is a best handled as vendor specific
annotation. EMMA provides an extensibility mechanism for
such annotations through the emma:info element. The
current markup from your feedback above does not meet the
EMMA XML schema as it contains emma:one-of within
a lattice emma:arc. Also the timestamp on the one of
may not be necessary since it matches that on emma:arc.
The oov information could alternatively be encoded as a vendor
or application specific extension
using emma:info as follows:

<emma:arc emma:from="6" emma:to="7"
               emma:start="1113501463034"
               emma:end="1113501463934"
               emma:confidence="0.72">
         <emma:info>
                 <example:oov class="OOV-Celestial-Body"
                                 phoneme="stez"
                                 grapheme="sters"
                                 confidence="0.74"/>
                 <example:oov class="OOV-Celestial-Body"
                                 phoneme="stO:z"
                                 grapheme="staurs"
                                 confidence="0.77"/>
                 <example:oov class="OOV-Celestial-Body"
                                 phoneme="stA:z"
                                 grapheme="stars"
                                 confidence="0.81"/>
         </emma:info>
</emma:arc>


4. TURN ID
=======================================================================

SUGGESTION FROM SMARTWEB:

In dialog applications it is important to distinguish between
each distinct turn. The xs:nonNegativInteger annotation specifies
the turn ID associated with an element.

         <emma:emma version="1.0"
           xmlns:emma="http://www.w3.org/2003/04/emma">
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2003/04/emma
         http://www.w3.org/TR/emma/emma10.xsd"
         xmlns="http://www.example.com/example">
         <emma:interpretation turn-id="42">
             ...
           </emma:interpretation>
         </emma:emma>

RESPONSE:

We agree that it is important to have an annotation of indicating turn id
and adopt your suggestion.

We have added a new section to the specification:

4.2.17 Dialog turns: emma:dialog-turn attribute

The emma:dialog-turn annotation associates the EMMA result in the container
element with a dialog turn. The syntax and semantics of dialog turns is 
left open to
suit the needs of individual applications. For example, some applications 
may use an integer
value, where successive turns are represented by successive integers. Other 
applications
may combine a name of a dialog participant with an integer value 
representing the turn
number for that participant. Ordering semantics for comparison of 
emma:dialog-turn is
deliberately unspecified and left for applications to define.


At 10:17 AM 10/2/2006 +0200, Liu, Jin wrote:
>Dear MMI-WG,
>
>the SmartWeb consortium (http://www.smartweb-project.de/) read the last
>work draft of EMMA and gathered some suggestions for a possible
>completion and extension of the EMMA document (with examples). With the
>suggested extension, EMMA would be able to present e.g. not only the
>input information but also output information as well as to support a
>better interpretation of the speech input (e.g. OOV situation).
>All the suggested extension has been implemented and tested within the
>SmartWeb project. It has been proved, that EMMA is a powerful and
>efficient format for component communication of a multimodal system. We
>would be very delighted, if the one or other suggestions can be
>considered in the EMMA document.
>
>The document is attached.
>
>Best regards
>
>Jin Liu (T-Systems)
>Massimo Romanelli (DFKI)
>Nobert Reithinger (DFKI)
>
>
>__________________________________
>
>Dr. Jin Liu
>T-Systems International GmbH
>Systems Integration
>TZ, ENPS
>Advanced Voice Solutions
>Address: Goslarer Ufer 35, 10589 Berlin
>Phone: +49 30  3497-2330
>Fax: +49 30 3497-2331
>Mobil: +49 170 5813203
>Email: Jin.Liu@t-systems.com
>Internet: http://www.t-systems.com
>Intranet: http://tzwww.telekom.de
>
>
>
Received on Tuesday, 3 October 2006 17:49:51 UTC