Re: Feedback to Last Call Working Draft of EMMA (Extensible Multimodal Annotation) specification from Massimo Romanelli on 2006-11-16 (www-multimodal@w3.org from November 2006)

From: Massimo Romanelli <romanell@dfki.de>
Date: Thu, 16 Nov 2006 14:36:49 +0100
To: Michael Johnston <johnston@research.att.com>
Cc: www-multimodal@w3.org, johnston@research.att.com, dahl@conversational-technologies.com, norbert.reithinger@dfki.de, "Liu, Jin" <Jin.Liu@t-systems.com>
Message-ID: <455C6971.7050302@dfki.de>
Dear W3C Multimodal working group,  

we would like to thank greatly you for your feedback on our EMMA suggestions. 
We accept it and agree with your further considerations. 
We wish to see the EMMA recommendation as soon as possible 
and appreciate your decision to adopt our suggestions on dialog turns.

Thanks again for detailed remarks and comments.


best regards

Norbert Reithinger (DFKI)
Massimo Romanelli (DFKI)


Michael Johnston schrieb:
>
> Dear SmartWeb consortium,
>
> Many thanks for your feedback on the EMMA
> specification. The W3C Multimodal working group
> have reviewed the comments in some detail
> and this has resulted in changes to
> the current draft of the EMMA specification
> and will inform our work in future versions of
> EMMA as well as the architecture and
> authoring efforts ongoing within the group.
> Our formal responses are detailed below.
>
> Thanks again for your detailed feedback.
>
> best
> Michael Johnston (at&t, Chair, EMMA subgroup)
>
>
>
>
>
> RESPONSE TO FEEDBACK FROM SMARTWEB CONSORTIUM:
> ======================================================================
>
> 1. USING EMMA FOR OUTPUT ALSO:
> ======================================================================
>
> Suggest use of EMMA to represent output using emma:result element.
>
> RESPONSE:
>
> The current scope of the EMMA specification is to provide
> a framework for representing and annotating user inputs.
> There are considerably more issues to address and work
> needed to give an adequate representation of user output
> and so for the current specification document the multimodal
> working group have chosen to defer work on output. For
> example, how would graphical output be handled, if the
> system is going to draw ink, display a table, or zoom a map?
> There has been interest in output representation both inside
> and outside the working group. In a future version of EMMA we
> may consider this topic, and would at that time return to
> your contribution and others we have received.
>
>
> 2. USING EMMA FOR STATUS COMMUNICATION AMONG COMPONENTS
> ======================================================================
>
> PROPOSAL TO ADD EMMA ANNOTATIONS FOR STATUS COMMUNICATION
> AMONG COMPONENTS:
>
>         emma:status
>         emma:actual-answer-time
>         emma:expected-answer-time
>         emma:query-running
>
> RESPONSE:
>
> The scope of EMMA is to provide an representation and annotation
> mechanism for user inputs to spoken and multimodal systems. As
> such status communication messages among processing components
> fall outside the scope of EMMA and are better addressed as part of the
> MMI architecture outside of EMMA. We are forwarding this feedback to
> the architecture and authoring subgroups within the W3C Multimodal
> working group. This contribution is of particular interest to the
> authoring effort.
>
>
>
> 3. OOV
> =======================================================================
>
> PROPOSAL TO ADD EMMA:OOV MARKUP FOR INDICATING PROPERTIES OF
> OUT OF VOCABULARY ITEMS:
>
>         emma:oov
>
>         <emma:arc emma:from="6" emma:to="7"
>                 emma:start="1113501463034"
>                emma:end="1113501463934"
>                emma:confidence="0.72">
>         <emma:one-of id="MMR-1-1-OOV"
>           emma:start="1113501463034" emma:end="1113501463934">
>                         <emma:oov emma:class="OOV-Celestial-Body"
>                                 emma:phoneme="stez"
>                                 emma:grapheme="sters"
>                                 emma:confidence="0.74"/>
>                       <emma:oov emma:class="OOV-Celestial-Body"
>                                 emma:phoneme="stO:z"
>                                 emma:grapheme="staurs"
>                                 emma:confidence="0.77"/>
>                       <emma:oov emma:class="OOV-Celestial-Body"
>                                 emma:phoneme="stA:z"
>                                  emma:grapheme="stars"
>                                 emma:confidence="0.81"/>
>                             </emma:one-of>
>                 </emma:arc>
>
>
> RESPONSE:
>
> While the ability to specify recognize and annotate the
> presence of out of vocabulary items appears extremely
> valuable, the EMMA group are concerned as to how many
> recognizers will in fact provide this capability. Furthermore
> to develop this proposal fully significant time will have to
> be assigned.  Therefore we believe that the proposed
> annotation for oov is a best handled as vendor specific
> annotation. EMMA provides an extensibility mechanism for
> such annotations through the emma:info element. The
> current markup from your feedback above does not meet the
> EMMA XML schema as it contains emma:one-of within
> a lattice emma:arc. Also the timestamp on the one of
> may not be necessary since it matches that on emma:arc.
> The oov information could alternatively be encoded as a vendor
> or application specific extension
> using emma:info as follows:
>
> <emma:arc emma:from="6" emma:to="7"
>               emma:start="1113501463034"
>               emma:end="1113501463934"
>               emma:confidence="0.72">
>         <emma:info>
>                 <example:oov class="OOV-Celestial-Body"
>                                 phoneme="stez"
>                                 grapheme="sters"
>                                 confidence="0.74"/>
>                 <example:oov class="OOV-Celestial-Body"
>                                 phoneme="stO:z"
>                                 grapheme="staurs"
>                                 confidence="0.77"/>
>                 <example:oov class="OOV-Celestial-Body"
>                                 phoneme="stA:z"
>                                 grapheme="stars"
>                                 confidence="0.81"/>
>         </emma:info>
> </emma:arc>
>
>
> 4. TURN ID
> =======================================================================
>
> SUGGESTION FROM SMARTWEB:
>
> In dialog applications it is important to distinguish between
> each distinct turn. The xs:nonNegativInteger annotation specifies
> the turn ID associated with an element.
>
>         <emma:emma version="1.0"
>           xmlns:emma="http://www.w3.org/2003/04/emma">
>           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>         xsi:schemaLocation="http://www.w3.org/2003/04/emma
>         http://www.w3.org/TR/emma/emma10.xsd"
>         xmlns="http://www.example.com/example">
>         <emma:interpretation turn-id="42">
>             ...
>           </emma:interpretation>
>         </emma:emma>
>
> RESPONSE:
>
> We agree that it is important to have an annotation of indicating turn id
> and adopt your suggestion.
>
> We have added a new section to the specification:
>
> 4.2.17 Dialog turns: emma:dialog-turn attribute
>
> The emma:dialog-turn annotation associates the EMMA result in the 
> container
> element with a dialog turn. The syntax and semantics of dialog turns 
> is left open to
> suit the needs of individual applications. For example, some 
> applications may use an integer
> value, where successive turns are represented by successive integers. 
> Other applications
> may combine a name of a dialog participant with an integer value 
> representing the turn
> number for that participant. Ordering semantics for comparison of 
> emma:dialog-turn is
> deliberately unspecified and left for applications to define.
>
>
> At 10:17 AM 10/2/2006 +0200, Liu, Jin wrote:
>> Dear MMI-WG,
>>
>> the SmartWeb consortium (http://www.smartweb-project.de/) read the last
>> work draft of EMMA and gathered some suggestions for a possible
>> completion and extension of the EMMA document (with examples). With the
>> suggested extension, EMMA would be able to present e.g. not only the
>> input information but also output information as well as to support a
>> better interpretation of the speech input (e.g. OOV situation).
>> All the suggested extension has been implemented and tested within the
>> SmartWeb project. It has been proved, that EMMA is a powerful and
>> efficient format for component communication of a multimodal system. We
>> would be very delighted, if the one or other suggestions can be
>> considered in the EMMA document.
>>
>> The document is attached.
>>
>> Best regards
>>
>> Jin Liu (T-Systems)
>> Massimo Romanelli (DFKI)
>> Nobert Reithinger (DFKI)
>>
>>
>> __________________________________
>>
>> Dr. Jin Liu
>> T-Systems International GmbH
>> Systems Integration
>> TZ, ENPS
>> Advanced Voice Solutions
>> Address: Goslarer Ufer 35, 10589 Berlin
>> Phone: +49 30  3497-2330
>> Fax: +49 30 3497-2331
>> Mobil: +49 170 5813203
>> Email: Jin.Liu@t-systems.com
>> Internet: http://www.t-systems.com
>> Intranet: http://tzwww.telekom.de
>>
>>
>>
>

-- 
---------------------------------------------------------------
Massimo Romanelli                       
German Research Center for Artificial Intelligence -- DFKI GmbH   
Stuhlsatzenhausweg 3 D-66123 Saarbrücken Germany
web:   http://www.dfki.de/~romanell               
email:  romanell@dfki.de                   
phone: +49 (0681) 302 64819                   
fax:   +49 (0681) 302 5020                   
---------------------------------------------------------------
Received on Thursday, 16 November 2006 19:10:06 UTC