RE: EMMA in Speech API (was RE: Speech API: first editor's draft posted) from Young, Milan on 2012-06-12 (public-speech-api@w3.org from June 2012)

From: Young, Milan <Milan.Young@nuance.com>
Date: Tue, 12 Jun 2012 15:41:45 +0000
To: Deborah Dahl <dahl@conversational-technologies.com>, 'Satish S' <satish@google.com>
CC: 'Hans Wennborg' <hwennborg@google.com>, "olli@pettay.fi" <olli@pettay.fi>, 'Bjorn Bringert' <bringert@google.com>, 'Glen Shires' <gshires@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C1A4727D8@SOM-EXCH04.nuance.com>
I'm also happy with the new text.  I suggest we add a link to an appendix or something for the use cases.  At present we have:


Use case 1: I'm testing different speech recognition services. I would like to know which service processed the speech associated with a particular result, so that I can compare the services for accuracy. I can use the emma:process parameter for that.



Use case 2: I want the system to dynamically slow down its TTS for users who speak more slowly. The EMMA timestamps, duration, and token parameters can be used to determine the speech rate for a particular utterance.



Use case 3: I'm testing several different grammars to compare their accuracy. I use the emma:grammar parameter to record which grammar was used for each result.



Use case 4: My application server is based on an MMI architecture (link http://www.w3.org/TR/mmi-arch/), and uses EMMA documents to communicate results.  I POST the EMMA results to the server in order to derive a next state in the dialog.

Thanks


From: Deborah Dahl [mailto:dahl@conversational-technologies.com]
Sent: Tuesday, June 12, 2012 8:32 AM
To: 'Satish S'; Young, Milan
Cc: 'Hans Wennborg'; olli@pettay.fi; 'Bjorn Bringert'; 'Glen Shires'; public-speech-api@w3.org
Subject: RE: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

That seems good to me. On the point of the UA modifying the EMMA, I think it's ok if we prohibit the UA from modifying the EMMA, because the application can certainly make its own modifications if it wants extra information added, for example, for logging purposes, or to attach the name of the application, or whatever.
If we want to propose to the MMIWG that some EMMA attributes should be obligatory to support specific use cases, we can do that. I think we might want to wait before we put together a proposal, though, in case we think of other use cases.

From: Satish S [mailto:satish@google.com]<mailto:[mailto:satish@google.com]>
Sent: Tuesday, June 12, 2012 11:09 AM
To: Young, Milan
Cc: Deborah Dahl; Hans Wennborg; olli@pettay.fi<mailto:olli@pettay.fi>; Bjorn Bringert; Glen Shires; public-speech-api@w3.org<mailto:public-speech-api@w3.org>
Subject: Re: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

Thanks Milan and Deborah. Looks like we agree on the following language. Could you confirm?

Section 5.1:
  readonly attribute Document emma;

Section 5.1.6 needs
  emma - EMMA 1.0 (link to http://www.w3.org/TR/emma/) representation of this result.  The contents of this result could vary across UAs and recognition engines, but all implementations MUST expose a valid XML document complete with EMMA namespace. UA implementations for recognizers that supply EMMA MUST pass that EMMA structure directly.

Cheers
Satish
On Tue, Jun 12, 2012 at 3:59 PM, Young, Milan <Milan.Young@nuance.com<mailto:Milan.Young@nuance.com>> wrote:
I'm also fine with dropping the specific attributes and instead attaching a set of EMMA use cases.

But I'm wary of the UA modifying the EMMA document: 1) This is starting to get into non-trivial domains with encodings and such, 2) The application could easily attach UA information to the log.



From: Deborah Dahl [mailto:dahl@conversational-technologies.com<mailto:dahl@conversational-technologies.com>]
Sent: Tuesday, June 12, 2012 7:51 AM

To: 'Satish S'; Young, Milan
Cc: 'Hans Wennborg'; olli@pettay.fi<mailto:olli@pettay.fi>; 'Bjorn Bringert'; 'Glen Shires'; public-speech-api@w3.org<mailto:public-speech-api@w3.org>
Subject: RE: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

I'm not sure why a web developer would care whether the EMMA they get from the UA is exactly what the speech recognizer supplied. On the other hand, I can think of useful things that the UA could add to the EMMA, for example, something in the <info> tag about the UA  that the request originated from, that the recognizer wouldn't necessarily know about. In that case you might actually want modified EMMA.
I agree with Satish's point that we might think of other use cases that require specific EMMA attributes, so I don't really see the need to call out those specific attributes.

From: Satish S [mailto:satish@google.com]<mailto:[mailto:satish@google.com]>
Sent: Tuesday, June 12, 2012 5:22 AM
To: Young, Milan
Cc: Deborah Dahl; Hans Wennborg; olli@pettay.fi<mailto:olli@pettay.fi>; Bjorn Bringert; Glen Shires; public-speech-api@w3.org<mailto:public-speech-api@w3.org>
Subject: Re: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

I believe it is more useful for web developers if the UA is required to passed through the EMMA structure from recognizer as is, so they can rest assured the UA doesn't modify what the recognizer sends. To that effect, here is a modified proposal (version 4) based on Milan's version 3:

---------------
Section 5.1:
  readonly attribute Document emma;

Section 5.1.6 needs
  emma - EMMA 1.0 (link to http://www.w3.org/TR/emma/) representation of this result.  The contents of this result could vary across UAs and recognition engines, but all implementations MUST expose a valid XML document complete with EMMA namespace.
- UA implementations for recognizers that supply EMMA MUST pass that EMMA structure directly.
- UA implementations for recognizers that do not supply EMMA SHOULD expose the following:
 * <emma:interpretation> tag(s) populated with the interpretation (e.g. emma:literal or slot values)
 * The following attributes on the <emma:interpretation> tag: id, emma:process, emma:tokens, emma:medium, emma:mode.
---------------

Milan, the list of attributes mentioned in the last bullet has been gathered from the use cases mentioned in this thread. This list can change if we think of more use cases going forward. So should we even list them at all or since the first point has the MUST clause is that sufficient?
Cheers
Satish
On Mon, Jun 11, 2012 at 7:38 PM, Young, Milan <Milan.Young@nuance.com<mailto:Milan.Young@nuance.com>> wrote:
Is there consensus on the following (version 3) proposal:

Section 5.1:
  readonly attribute Document emma;

Section 5.1.6 needs
  emma - EMMA 1.0 (link to http://www.w3.org/TR/emma/) representation of this result.  The contents of this result could vary across UAs and recognition engines, but all implementations MUST expose a valid XML document complete with EMMA namespace.  Implementations SHOULD expose the following:
  * <emma:interpretation> tag(s) populated with the interpretation (e.g. emma:literal or slot values)
  * The following attributes on the <emma:interpretation> tag: id, emma:process, emma:tokens, emma:medium, emma:mode.

Thanks

From: Deborah Dahl [mailto:dahl@conversational-technologies.com<mailto:dahl@conversational-technologies.com>]
Sent: Monday, June 11, 2012 11:29 AM

To: 'Satish S'; Young, Milan
Cc: 'Hans Wennborg'; olli@pettay.fi<mailto:olli@pettay.fi>; 'Bjorn Bringert'; 'Glen Shires'; public-speech-api@w3.org<mailto:public-speech-api@w3.org>
Subject: RE: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

Glenn pointed out to me offline that Satish was asking about whether the attributes that are required for the use cases we've been discussing are required in EMMA 1.0.  I have to admit that I've lost track of what use cases we're talking about, but I think at least 3 of them are listed in  http://lists.w3.org/Archives/Public/public-speech-api/2012May/0037.html . Those use cases require "emma:process", the timestamps, and "emma:grammar", which are not required in EMMA 1.0. The other use case we might be talking about is described in http://lists.w3.org/Archives/Public/public-speech-api/2012Apr/0014.html, where an existing dialog manager or logger expects to receive speech recognition results as an EMMA document, in which case no specific attributes are required.

From: Deborah Dahl [mailto:dahl@conversational-technologies.com]<mailto:[mailto:dahl@conversational-technologies.com]>
Sent: Monday, June 11, 2012 1:40 PM
To: 'Satish S'; 'Young, Milan'
Cc: 'Hans Wennborg'; olli@pettay.fi<mailto:olli@pettay.fi>; 'Bjorn Bringert'; 'Glen Shires'; public-speech-api@w3.org<mailto:public-speech-api@w3.org>
Subject: RE: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

Hi Satish,
All of the EMMA attributes that have been proposed for the use cases we've discussed are already part of the EMMA 1.0 standard. That said, the Multimodal Interaction Working Group is always interested in receiving comments and suggestions that relate to possible new EMMA capabilities, which can be posted to www-multimodal@w3.org<mailto:www-multimodal@w3.org>.
Regards,
Debbie

From: Satish S [mailto:satish@google.com]<mailto:[mailto:satish@google.com]>
Sent: Monday, June 11, 2012 12:18 PM
To: Young, Milan
Cc: Hans Wennborg; Deborah Dahl; olli@pettay.fi<mailto:olli@pettay.fi>; Bjorn Bringert; Glen Shires; public-speech-api@w3.org<mailto:public-speech-api@w3.org>
Subject: Re: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

If there are EMMA attributes that are mandatory for specific use cases, we should post to the MMI WG and get those changes into the EMMA recommendation published at http://www.w3.org/TR/emma/. I'm sure they will be interested in incorporating them and Deborah Dahl can help as well since she is one of the authors.

Cheers
Satish
On Mon, Jun 11, 2012 at 4:16 PM, Young, Milan <Milan.Young@nuance.com<mailto:Milan.Young@nuance.com>> wrote:
Hello Hans,

I did respond to this thread, but it got forked.  The upshot is that we should go with my second (most recent) proposal, not my first proposal (that Satish supported).  The reason is that the first proposal did not allow us to achieve the interoperability use cases that Deborah put forward.

To addresses Satish's most recent argument, the likely hood of an application failing because the EMMA result contains an extra couple attributes is small.  This is because 1) most EMMA implementations support these attributes already, 2) we're dealing with XML which abstracts low-level parsing, 3) If an application did fail, the fix would be trivial.

Thanks


-----Original Message-----
From: Hans Wennborg [mailto:hwennborg@google.com<mailto:hwennborg@google.com>]
Sent: Monday, June 11, 2012 2:56 AM
To: Deborah Dahl
Cc: Satish S; olli@pettay.fi<mailto:olli@pettay.fi>; Young, Milan; Bjorn Bringert; Glen Shires; public-speech-api@w3.org<mailto:public-speech-api@w3.org>
Subject: Re: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

Do we have agreement on this? If there are no objections, I'll update the spec with the text Satish posted on the 8th (with DOMString substituted with Document):

----
Addition to SpeechRecognitionResult (section 5.1)

 readonly attribute Document emma;

And the corresponding addition to 5.1.6:
 emma - A string representation of the XML-based <link>EMMA 1.0</link> result. (link points to http://www.w3.org/TR/emma/
----

Thanks,
Hans

On Fri, Jun 8, 2012 at 2:32 PM, Deborah Dahl <dahl@conversational-technologies.com<mailto:dahl@conversational-technologies.com>> wrote:
> I agree that Document would be more useful.
>
>
>
> From: Satish S [mailto:satish@google.com<mailto:satish@google.com>]
> Sent: Friday, June 08, 2012 5:18 AM
> To: Hans Wennborg
> Cc: olli@pettay.fi<mailto:olli@pettay.fi>; Young, Milan; Deborah Dahl; Bjorn Bringert; Glen
> Shires; public-speech-api@w3.org<mailto:public-speech-api@w3.org>
>
>
> Subject: Re: EMMA in Speech API (was RE: Speech API: first editor's
> draft
> posted)
>
>
>
> Yes that is correct, it should be
>
>   readonly attribute Document emma;
>
>
> Cheers
> Satish
>
> On Fri, Jun 8, 2012 at 10:04 AM, Hans Wennborg <hwennborg@google.com<mailto:hwennborg@google.com>> wrote:
>
> On Fri, Jun 8, 2012 at 12:31 AM, Satish S <satish@google.com<mailto:satish@google.com>> wrote:
>> In any case, looks like there is enough interest both from speech &
>> browser vendors to have this attribute always non-null. So I'm fine
>> making it so.
>> I
>> like the first proposal from Milan:
>> ----
>> Addition to SpeechRecognitionResult (section 5.1)
>>
>>  readonly attribute DOMString emma;
>>
>> And the corresponding addition to 5.1.6:
>>  emma - A string representation of the XML-based <link>EMMA
>> 1.0</link> result. (link points to http://www.w3.org/TR/emma/
>> ----
>>
>> This spec proposal shouldn't mandate specific fields any more than
>> what EMMA does already so that web apps can point to existing
>> recognizers and get EMMA data in the same format as they would get
>> otherwise.
>
> Earlier in the thread, I thought we decided that it was better to make
> the emma attribute be of type Document rather than DOMString?
Received on Tuesday, 12 June 2012 15:42:22 UTC