- From: <johnston@research.att.com>
- Date: Sun, 1 Apr 2007 16:59:55 -0400
- To: <www-multimodal@w3.org>
- Message-ID: <0C50B346CAD5214EA8B5A7C0914CF2A4290BDA@njfpsrvexg3.research.att.com>
The multimodal working group greatly appreciate the detailed feedback on the EMMA specification from the Voice Browser working group. As a result of this feedback, and feedback from other groups, a number of substantive revisions have been made to the EMMA specification along with reorganization and numerous editorial changes. The multimodal working group will shortly release a second last call working draft incorporating these revisions. The points below provide formal responses to the feedback specifically from the VB working group. Michael Johnston AT&T Editor-in-Chief EMMA Specification Formal response to feedback from Voice Browser working group on EMMA last call: =============================================== VB-A1: Clarification / Typo / Editorial Request of EMMA profile for VoiceXML 2.0/2.1 Resolution: Rejected The Multimodal working group sees significant benefit in the creation of an EMMA profile for VoiceXML 2.0/2.1. However, the group rejects the request to have this work within the EMMA specification itself. The request might best be resolved by a W3C Note on these issues, or maybe more broadly on the whole chain that connects a VoiceXML page, to SRGS+SISR grammars and then EMMA to return speech/dtmf results to VoiceXML. We suggest that this document should be edited by VBWG with some support from MMIWG. ================== VB.A1.1: Change to Existing Feature Profile: Values of emma:mode Resolution: Accepted The MMIWG agree that the values of emma:mode of specific relevance to VXML should be revised in EMMA. For the current editor draft, and for the candidate recommendation we will change the emma:mode values as follows in Section 4.2.11 and throughout the document as follows: - from "dtmf_keypad" to "dtmf" - from "speech" to "voice" ================== VB.A1.2: Clarification / Typo / Editorial Profile: Optional/Mandatory Resolution: Accepted w/ modifications With respect to the specification of what is optional and mandatory for the profile that information should be part of the EMMA VXML profile we propose should be edited within the VBWG (See VB.A1). As regards the option/mandatory status of EMMA features separate from any specific profile we haved reviewed them in detail for the whole EMMA specification and this will be reflected in the next draft. ================== VB.A3: Clarification / Typo / Editorial Profile: DTMF/speech Resolution: Rejected See VB.A1 for profile. The MMIWG agree that this should be made clear in the EMMA VXML profile document, but as specified in VB.A1 above we propose that profile should be edited within the VBWG. ================== VB.A4: Clarification / Typo / Editorial Profile: Record results Resolution: Accepted The MMIWG agree that for VXML and for more general use it is important that EMMA can be used to annotated recorded inputs. The specification already contains an attribute that can be used to provide the URI of the recorded signal; this is the function of emma:signal. Additionally the specification provides the annotation emma:function="recording" for indication that an input is a recording rather than a command to an interactive system. One issue that arose in our review of this feedback, is that in the case of recordings there would be no content within the emma:interpretation element. Currently this is only possible if emma:no-input="true" or emma:uninterpreted="true". The issue is whether a recording can be marked as emma:uninterpreted="true". To resolve this issue the MMIWG propose to revise the EMMA specification so that is clear that emma:uninterpreted="true" means that no interpretation was produced but there is no implication that an attempt was made to produce an interpretation. Recordings then should be marked as emma:uninterpreted="true". For the record results, the duration of the recording can be determined from either the emma:duration feature or using absolute timestamps by subtracting the value of emma:start from emma:end. Potentially the size can be determined using a combination of emma:duration and the emma:media-type. However to avoid burdening the emma consumer with this additional calculation and to ensure that exact size can be indicated, the MMIWG propose the addition of a new annotation emma:signal-size which indicates the size of a recording (referred to with emma:signal) in 8-bit octets. This facilitates the integration of EMMA with VXML. ================== VB.A5: Change to Existing Feature Profile: Informative Examples Resolution: Rejected See VB.A1 for profile. Informative examples regarding the specific use of EMMA for VXML 2.0/2.1 are best handled in a separate note or specification. As proposed in VB.A1 this work should be edited within the VBWG. ================== VB.B EMMA and evolutions of VoiceXML Resolution: Deferred These comments are extremely useful for future versions of EMMA but go beyond the goal and requirements of the current specification. ================================================== VBWG Comments on EMMA LCWD [1] ================================================== EMMA provides a good attempt to extend and complete the annotation of input from different modalities. This is an interesting evolution to be evaluated in the context of the current recommendation and also for the advances of Voice Browser standards. We acknowledge that EMMA LCWD [1] is improved and more complete, but we suggest a few extensions that are important for us and some general comments too. ================================================== A. EMMA and VoiceXML 2.0/2.1 In the today speech market, VoiceXML 2.0 [2] is the reference specification with a large presence in the industry. Since Mar. 2004, VoiceXML 2.0 is a W3C recommendation. Moreover VoiceXML 2.1 [3] extends VoiceXML 2.0 by adding a restricted number of new features, none of them has a strict relationship with EMMA specification. VoiceXML 2.1 is close to become a Proposed Recommendation. In this context we see very valuable to have an EMMA profile for VoiceXML 2.0/2.1, whose goal is to enable a quick adoption of EMMA in the current voice browser industry. EMMA might be conveyed by the adoption of protocols such as IETF MRCPv2 which gives options to adopt EMMA as the format for speech results. The VoiceXML 2.0/2.1 states a limited number of annotations to be collected for each speech or DTMF input (see Table 10 [4] of VoiceXML 2.0 spec). Moreover, multiple ASR results may be returned on application request, they have to be represented as N-best (see [5] of VoiceXML 2.0 for details). Here are some practical proposals of possible extensions to EMMA specification. ---------- Proposal 1 - EMMA profile Describe in EMMA spec a VoiceXML 2.0/2.1 profile, either in an Appendix or in a Section of the specification. This profile should describe the mandatory annotations to allow a complete integration in a VoiceXML 2.0/2.1 compliant browser. The VoiceXML 2.0/2.1 requires four annotations related to an input. They are described normatively in [4], as shadow variables related to a form input item. The same values are also accessible from the application.lastresult$ variable, see [4]. The annotations are the following: - name$.utterance which might be conveyed by "emma:token" attribute ([6]) - name$.confidence which might be conveyed by "emma:confidence" attribute ([7]) The range of values seem to be fine: 0.0 - 1.0, but some checks could be made in the schema of both the specs. - name$.inputmode which might be conveyed by "emma:mode" attribute ([8]) Proposal 1.1 for a discussion of its values - name$.interpretation, is an ECMA script value containing the semantic result which has to be derived by the content of "emma:interpretation" As regards the N-best results, see [5] for details, the one-of element should be suitable to convey them to the voice Browser. --- Proposal 1.1 - Values of emma:mode Some clarification should be needed to explain how to map the values of "emma:mode" ([8]) to the expected values of the "inputmode" variable (Table 10 in [4]). The voiceXML 2.0/2.1 prescribes two values: "speech" and "dtmf". Anther option is to adopt in EMMA the exact values expected by VoiceXML 2.0/2.1 to simplify the mapping. Other related fine grained EMMA annotation are not possible in VoiceXML 2.0/2.1. --- Proposal 1.2 - Optional/mandatory The profile should clarify which is mandatory and which is optional. For instance N-best are an optional feature for VoiceXML 2.0/2.1, while the other annotations are mandatory. ---------- Proposal 2 - Consider 'noinput' and 'nomatch' Besides user input from a successful recognition, there are several other types of results that VoiceXML applications deal with that should be part of a VoiceXML profile for EMMA as well as the ones suggested in Proposal 1. 'noinput' and 'nomatch' situations are mandatory for VoiceXML 2.0/2.1. Since EMMA can also represent these, the EMMA annotations for 'noinput' and 'nomatch' should be part of the VoiceXML EMMA profile. Note that in VoiceXML 'nomatch' may carry recognition results as described in Proposal 1 to be inserted in the application.lastresult$ variable only. ---------- Proposal 3 - DTMF/speech It is very important that EMMA will be usable for either speech or DTMF input results, because VoiceXML2.0/2.1 allows both these inputmode values. We expect that the VoiceXML profile in EMMA will make this clear to enforce a complete usage of EMMA for Voice Browser applications. ---------- Proposal 4 - Record results EMMA can represent the results of a record operation, see the description of the record element of VoiceXML [9], so the EMMA annotations for recordings should also be part of a VoiceXML profile. This feature is optional in VoiceXML 2.0/2.1. ---------- Proposal 5 - Add informative examples We think that some informative examples will improve the description of the profile. This might include a SISR grammar for DTMF/speech and the expected EMMA results to be compliant to the profile. The examples should include both single result returned and N-best results. We think that also an alternative example of lattices would be very interesting, even if in the VoiceXML 2.0/2.1 it will not be representable, but nonetheless it will be useful for the evolution of VoiceXML, see point B below. ================================================== B. EMMA and the evolution of VoiceXML For the evolution of VoiceXML the current time is too premature to give precise feedback, but the clear intention is to take care of an extended usage of EMMA inside a future VoiceXML application. This includes, but it is not limited to: - leave access to the whole EMMA document inside the application.lastresult variable (both a raw EMMA document and a processed one, i.e. in ECMA-262 format) - include proper media-types to allow a clear indication if the raw results are expressed in EMMA or other formats (e.g. NLSML). The same for the processed results. Other possible evolutions will be to have a simple way to pass EMMA results from VoiceXML to other modules to allow further processing. A last point is that EMMA should be used to return results of Speaker Identification Verification (SIV) too. Voice Browser SIV subgroup is working to create a few examples to circulate them to you to get feedbacks. We will highly appreciate your comments on these ideas to better address this subject in the context of the evolution of Voice Browser standards. ================================================== [1] - http://www.w3.org/TR/2005/WD-emma-20050916/ [2] - http://www.w3.org/TR/2004/REC-voicexml20-20040316/ [3] - http://www.w3.org/TR/2005/CR-voicexml21-20050613/ [4] - http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml2.3.1 [5] - http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml5.1.5 [6] - http://www.w3.org/TR/emma/#s4.2.1 [7] - http://www.w3.org/TR/emma/#s4.2.8 [8] - http://www.w3.org/TR/emma/#s4.2.11 [9] - http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml2.3.6
Received on Sunday, 1 April 2007 21:01:12 UTC