W3C home > Mailing lists > Public > public-speech-api@w3.org > June 2012

RE: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

From: Young, Milan <Milan.Young@nuance.com>
Date: Tue, 12 Jun 2012 16:35:31 +0000
To: Hans Wennborg <hwennborg@google.com>
CC: Deborah Dahl <dahl@conversational-technologies.com>, Satish S <satish@google.com>, "olli@pettay.fi" <olli@pettay.fi>, Bjorn Bringert <bringert@google.com>, Glen Shires <gshires@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C1A47280A@SOM-EXCH04.nuance.com>
I had suggested that we add a link to the use cases.  Do we want to capture those in the spec or another document?

Thanks

-----Original Message-----
From: Hans Wennborg [mailto:hwennborg@google.com] 
Sent: Tuesday, June 12, 2012 8:56 AM
To: Young, Milan
Cc: Deborah Dahl; Satish S; olli@pettay.fi; Bjorn Bringert; Glen Shires; public-speech-api@w3.org
Subject: Re: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

Thanks all! I've updated the spec to add the emma attribute:
http://dvcs.w3.org/hg/speech-api/rev/ae432e2c84f7

 - Hans

On Tue, Jun 12, 2012 at 4:41 PM, Young, Milan <Milan.Young@nuance.com> wrote:
> I'm also happy with the new text.  I suggest we add a link to an 
> appendix or something for the use cases.  At present we have:
>
>
>
> Use case 1: I'm testing different speech recognition services. I would 
> like to know which service processed the speech associated with a 
> particular result, so that I can compare the services for accuracy. I 
> can use the emma:process parameter for that.
>
>
>
> Use case 2: I want the system to dynamically slow down its TTS for 
> users who speak more slowly. The EMMA timestamps, duration, and token 
> parameters can be used to determine the speech rate for a particular utterance.
>
>
>
> Use case 3: I'm testing several different grammars to compare their 
> accuracy. I use the emma:grammar parameter to record which grammar was 
> used for each result.
>
>
>
> Use case 4: My application server is based on an MMI architecture 
> (link http://www.w3.org/TR/mmi-arch/), and uses EMMA documents to 
> communicate results.  I POST the EMMA results to the server in order 
> to derive a next state in the dialog.
>
>
>
> Thanks
>
>
>
>
>
> From: Deborah Dahl [mailto:dahl@conversational-technologies.com]
>
> Sent: Tuesday, June 12, 2012 8:32 AM
> To: 'Satish S'; Young, Milan
> Cc: 'Hans Wennborg'; olli@pettay.fi; 'Bjorn Bringert'; 'Glen Shires'; 
> public-speech-api@w3.org
> Subject: RE: EMMA in Speech API (was RE: Speech API: first editor's 
> draft
> posted)
>
>
>
> That seems good to me. On the point of the UA modifying the EMMA, I 
> think it's ok if we prohibit the UA from modifying the EMMA, because 
> the application can certainly make its own modifications if it wants 
> extra information added, for example, for logging purposes, or to 
> attach the name of the application, or whatever.
>
> If we want to propose to the MMIWG that some EMMA attributes should be 
> obligatory to support specific use cases, we can do that. I think we 
> might want to wait before we put together a proposal, though, in case 
> we think of other use cases.
>
>
>
> From: Satish S [mailto:satish@google.com]
> Sent: Tuesday, June 12, 2012 11:09 AM
> To: Young, Milan
> Cc: Deborah Dahl; Hans Wennborg; olli@pettay.fi; Bjorn Bringert; Glen 
> Shires; public-speech-api@w3.org
> Subject: Re: EMMA in Speech API (was RE: Speech API: first editor's 
> draft
> posted)
>
>
>
> Thanks Milan and Deborah. Looks like we agree on the following language.
> Could you confirm?
>
>
> Section 5.1:
>   readonly attribute Document emma;
>
> Section 5.1.6 needs
>   emma - EMMA 1.0 (link to http://www.w3.org/TR/emma/) representation 
> of this result.  The contents of this result could vary across UAs and 
> recognition engines, but all implementations MUST expose a valid XML 
> document complete with EMMA namespace. UA implementations for 
> recognizers that supply EMMA MUST pass that EMMA structure directly.
>
>
>
> Cheers
> Satish
>
> On Tue, Jun 12, 2012 at 3:59 PM, Young, Milan <Milan.Young@nuance.com>
> wrote:
>
> I'm also fine with dropping the specific attributes and instead 
> attaching a set of EMMA use cases.
>
>
>
> But I'm wary of the UA modifying the EMMA document: 1) This is 
> starting to get into non-trivial domains with encodings and such, 2) 
> The application could easily attach UA information to the log.
>
>
>
>
>
>
>
> From: Deborah Dahl [mailto:dahl@conversational-technologies.com]
> Sent: Tuesday, June 12, 2012 7:51 AM
>
>
> To: 'Satish S'; Young, Milan
> Cc: 'Hans Wennborg'; olli@pettay.fi; 'Bjorn Bringert'; 'Glen Shires'; 
> public-speech-api@w3.org
> Subject: RE: EMMA in Speech API (was RE: Speech API: first editor's 
> draft
> posted)
>
>
>
> I'm not sure why a web developer would care whether the EMMA they get 
> from the UA is exactly what the speech recognizer supplied. On the 
> other hand, I can think of useful things that the UA could add to the 
> EMMA, for example, something in the <info> tag about the UA  that the 
> request originated from, that the recognizer wouldn't necessarily know 
> about. In that case you might actually want modified EMMA.
>
> I agree with Satish's point that we might think of other use cases 
> that require specific EMMA attributes, so I don't really see the need 
> to call out those specific attributes.
>
>
>
> From: Satish S [mailto:satish@google.com]
> Sent: Tuesday, June 12, 2012 5:22 AM
> To: Young, Milan
> Cc: Deborah Dahl; Hans Wennborg; olli@pettay.fi; Bjorn Bringert; Glen 
> Shires; public-speech-api@w3.org
> Subject: Re: EMMA in Speech API (was RE: Speech API: first editor's 
> draft
> posted)
>
>
>
> I believe it is more useful for web developers if the UA is required 
> to passed through the EMMA structure from recognizer as is, so they 
> can rest assured the UA doesn't modify what the recognizer sends. To 
> that effect, here is a modified proposal (version 4) based on Milan's version 3:
>
> ---------------
> Section 5.1:
>   readonly attribute Document emma;
>
> Section 5.1.6 needs
>   emma - EMMA 1.0 (link to http://www.w3.org/TR/emma/) representation 
> of this result.  The contents of this result could vary across UAs and 
> recognition engines, but all implementations MUST expose a valid XML 
> document complete with EMMA namespace.
> - UA implementations for recognizers that supply EMMA MUST pass that 
> EMMA structure directly.
> - UA implementations for recognizers that do not supply EMMA SHOULD 
> expose the following:
>  * <emma:interpretation> tag(s) populated with the interpretation (e.g.
> emma:literal or slot values)
>  * The following attributes on the <emma:interpretation> tag: id, 
> emma:process, emma:tokens, emma:medium, emma:mode.
> ---------------
>
> Milan, the list of attributes mentioned in the last bullet has been 
> gathered from the use cases mentioned in this thread. This list can 
> change if we think of more use cases going forward. So should we even 
> list them at all or since the first point has the MUST clause is that sufficient?
>
> Cheers
>
> Satish
>
> On Mon, Jun 11, 2012 at 7:38 PM, Young, Milan <Milan.Young@nuance.com>
> wrote:
>
> Is there consensus on the following (version 3) proposal:
>
>
>
> Section 5.1:
>
>   readonly attribute Document emma;
>
>
>
> Section 5.1.6 needs
>
>   emma - EMMA 1.0 (link to http://www.w3.org/TR/emma/) representation 
> of this result.  The contents of this result could vary across UAs and 
> recognition engines, but all implementations MUST expose a valid XML 
> document complete with EMMA namespace.  Implementations SHOULD expose 
> the
> following:
>
>   * <emma:interpretation> tag(s) populated with the interpretation (e.g.
> emma:literal or slot values)
>
>   * The following attributes on the <emma:interpretation> tag: id, 
> emma:process, emma:tokens, emma:medium, emma:mode.
>
>
>
> Thanks
>
>
>
> From: Deborah Dahl [mailto:dahl@conversational-technologies.com]
> Sent: Monday, June 11, 2012 11:29 AM
>
>
> To: 'Satish S'; Young, Milan
> Cc: 'Hans Wennborg'; olli@pettay.fi; 'Bjorn Bringert'; 'Glen Shires'; 
> public-speech-api@w3.org
> Subject: RE: EMMA in Speech API (was RE: Speech API: first editor's 
> draft
> posted)
>
>
>
> Glenn pointed out to me offline that Satish was asking about whether 
> the attributes that are required for the use cases we've been 
> discussing are required in EMMA 1.0.  I have to admit that I've lost 
> track of what use cases we're talking about, but I think at least 3 of 
> them are listed in http://lists.w3.org/Archives/Public/public-speech-api/2012May/0037.html .
> Those use cases require "emma:process", the timestamps, and 
> "emma:grammar", which are not required in EMMA 1.0. The other use case 
> we might be talking about is described in 
> http://lists.w3.org/Archives/Public/public-speech-api/2012Apr/0014.htm
> l, where an existing dialog manager or logger expects to receive 
> speech recognition results as an EMMA document, in which case no 
> specific attributes are required.
>
>
>
> From: Deborah Dahl [mailto:dahl@conversational-technologies.com]
> Sent: Monday, June 11, 2012 1:40 PM
> To: 'Satish S'; 'Young, Milan'
> Cc: 'Hans Wennborg'; olli@pettay.fi; 'Bjorn Bringert'; 'Glen Shires'; 
> public-speech-api@w3.org
> Subject: RE: EMMA in Speech API (was RE: Speech API: first editor's 
> draft
> posted)
>
>
>
> Hi Satish,
>
> All of the EMMA attributes that have been proposed for the use cases 
> we've discussed are already part of the EMMA 1.0 standard. That said, 
> the Multimodal Interaction Working Group is always interested in 
> receiving comments and suggestions that relate to possible new EMMA 
> capabilities, which can be posted to www-multimodal@w3.org.
>
> Regards,
>
> Debbie
>
>
>
> From: Satish S [mailto:satish@google.com]
> Sent: Monday, June 11, 2012 12:18 PM
> To: Young, Milan
> Cc: Hans Wennborg; Deborah Dahl; olli@pettay.fi; Bjorn Bringert; Glen 
> Shires; public-speech-api@w3.org
> Subject: Re: EMMA in Speech API (was RE: Speech API: first editor's 
> draft
> posted)
>
>
>
> If there are EMMA attributes that are mandatory for specific use 
> cases, we should post to the MMI WG and get those changes into the 
> EMMA recommendation published at http://www.w3.org/TR/emma/. I'm sure 
> they will be interested in incorporating them and Deborah Dahl can 
> help as well since she is one of the authors.
>
>
>
> Cheers
> Satish
>
> On Mon, Jun 11, 2012 at 4:16 PM, Young, Milan <Milan.Young@nuance.com>
> wrote:
>
> Hello Hans,
>
> I did respond to this thread, but it got forked.  The upshot is that 
> we should go with my second (most recent) proposal, not my first 
> proposal (that Satish supported).  The reason is that the first 
> proposal did not allow us to achieve the interoperability use cases that Deborah put forward.
>
> To addresses Satish's most recent argument, the likely hood of an 
> application failing because the EMMA result contains an extra couple 
> attributes is small.  This is because 1) most EMMA implementations 
> support these attributes already, 2) we're dealing with XML which 
> abstracts low-level parsing, 3) If an application did fail, the fix would be trivial.
>
> Thanks
>
>
>
> -----Original Message-----
> From: Hans Wennborg [mailto:hwennborg@google.com]
>
> Sent: Monday, June 11, 2012 2:56 AM
> To: Deborah Dahl
>
> Cc: Satish S; olli@pettay.fi; Young, Milan; Bjorn Bringert; Glen 
> Shires; public-speech-api@w3.org
> Subject: Re: EMMA in Speech API (was RE: Speech API: first editor's 
> draft
> posted)
>
> Do we have agreement on this? If there are no objections, I'll update 
> the spec with the text Satish posted on the 8th (with DOMString 
> substituted with
> Document):
>
> ----
> Addition to SpeechRecognitionResult (section 5.1)
>
>  readonly attribute Document emma;
>
> And the corresponding addition to 5.1.6:
>  emma - A string representation of the XML-based <link>EMMA 1.0</link> 
> result. (link points to http://www.w3.org/TR/emma/
> ----
>
> Thanks,
> Hans
>
> On Fri, Jun 8, 2012 at 2:32 PM, Deborah Dahl 
> <dahl@conversational-technologies.com> wrote:
>> I agree that Document would be more useful.
>>
>>
>>
>> From: Satish S [mailto:satish@google.com]
>> Sent: Friday, June 08, 2012 5:18 AM
>> To: Hans Wennborg
>> Cc: olli@pettay.fi; Young, Milan; Deborah Dahl; Bjorn Bringert; Glen 
>> Shires; public-speech-api@w3.org
>>
>>
>> Subject: Re: EMMA in Speech API (was RE: Speech API: first editor's 
>> draft
>> posted)
>>
>>
>>
>> Yes that is correct, it should be
>>
>>   readonly attribute Document emma;
>>
>>
>> Cheers
>> Satish
>>
>> On Fri, Jun 8, 2012 at 10:04 AM, Hans Wennborg <hwennborg@google.com>
>> wrote:
>>
>> On Fri, Jun 8, 2012 at 12:31 AM, Satish S <satish@google.com> wrote:
>>> In any case, looks like there is enough interest both from speech & 
>>> browser vendors to have this attribute always non-null. So I'm fine 
>>> making it so.
>>> I
>>> like the first proposal from Milan:
>>> ----
>>> Addition to SpeechRecognitionResult (section 5.1)
>>>
>>>  readonly attribute DOMString emma;
>>>
>>> And the corresponding addition to 5.1.6:
>>>  emma - A string representation of the XML-based <link>EMMA 
>>> 1.0</link> result. (link points to http://www.w3.org/TR/emma/
>>> ----
>>>
>>> This spec proposal shouldn't mandate specific fields any more than 
>>> what EMMA does already so that web apps can point to existing 
>>> recognizers and get EMMA data in the same format as they would get 
>>> otherwise.
>>
>> Earlier in the thread, I thought we decided that it was better to 
>> make the emma attribute be of type Document rather than DOMString?
>
>
>
>
>
>
Received on Tuesday, 12 June 2012 16:36:08 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 12 June 2012 16:36:10 GMT