W3C home > Mailing lists > Public > public-speech-api@w3.org > June 2012

Re: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

From: Satish S <satish@google.com>
Date: Fri, 8 Jun 2012 00:31:57 +0100
Message-ID: <CAHZf7RnAZc9Rp5ZKZJ+ebAxy-Ak5bSDcPaJaVDW_H_fUQPWcKA@mail.gmail.com>
To: olli@pettay.fi
Cc: "Young, Milan" <Milan.Young@nuance.com>, Hans Wennborg <hwennborg@google.com>, Deborah Dahl <dahl@conversational-technologies.com>, Bjorn Bringert <bringert@google.com>, Glen Shires <gshires@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
>
> Also, I think “null” only means that the API implementation doesn’t
> support EMMA, not necessarily that the recognizer doesn’t support EMMA


My proposal was to set the attribute to null only if the recognizer did not
return EMMA data. So a UA should always support EMMA if it were to work
with recognizers that support EMMA.

In any case, looks like there is enough interest both from speech & browser
vendors to have this attribute always non-null. So I'm fine making it so. I
like the first proposal from Milan:
----
Addition to SpeechRecognitionResult (section 5.1)

 readonly attribute DOMString emma;

And the corresponding addition to 5.1.6:
 emma - A string representation of the XML-based <link>EMMA 1.0</link>
result. (link points to http://www.w3.org/TR/emma/
----

This spec proposal shouldn't mandate specific fields any more than what
EMMA does already so that web apps can point to existing recognizers and
get EMMA data in the same format as they would get otherwise.

Cheers
Satish


On Thu, Jun 7, 2012 at 6:46 PM, Olli Pettay <Olli.Pettay@helsinki.fi> wrote:

> On 06/07/2012 07:12 PM, Young, Milan wrote:
>
>> Perhaps only a small percentage of *developers* are interested in this
>> feature, but I believe that a large percentage of *end-users* will be
>> impacted by this feature.  That's because enterprise-grade applications
>> are written by few but used by many.
>>
>> Every argument that I've heard for discarding this feature boils down to
>> implementation.  Given that implementation is trivial, this sounds like an
>> abuse of the community structure we are based on.  If we do not have a
>> resolution to add this feature by this weekend, I will escalate to the W3C
>> staff.
>>
>
> It is totally ok to me to require that if the speech service doesn't
> provide EMMA, UA wraps the result in some
> simple EMMA. That way the API stays consistent - some kind of EMMA
> document is always available.
>
>
>
>
> -Olli
>
>
>
>
>>
>>
>>
>>
>> -----Original Message----- From: Olli Pettay [mailto:
>> Olli.Pettay@helsinki.**fi <Olli.Pettay@helsinki.fi>] Sent: Thursday,
>> June 07, 2012 8:27 AM To: Hans Wennborg Cc: Young,
>> Milan; Deborah Dahl; Satish S; Bjorn Bringert; Glen Shires;
>> public-speech-api@w3.org Subject: Re: EMMA in Speech API (was RE: Speech
>> API: first
>> editor's draft posted)
>>
>> On 06/07/2012 04:52 PM, Hans Wennborg wrote:
>>
>>> I still don't think UAs that use a speech engine that doesn't support
>>> EMMA should be required to provide a non-null emma attribute.
>>>
>>> I don't think the vast majority of web developers will care about this.
>>>
>>> For existing applications that rely on EMMA, there would already be
>>> significant work involved to port to the web and this API. For those cases,
>>> checking for the null-case, and wrapping the results into EMMA using
>>> JavaScript shouldn't be a big deal.
>>>
>>> If there turns out to be a large demand from real web apps for the
>>> attribute to always be non-null, it would be easy to change the spec to
>>> require that. Doing it the other way around, allowing web apps to rely
>>> on it now, and then change it to sometimes return null would be much
>>> harder.
>>>
>>> Thanks, Hans
>>>
>>
>> It makes no sense to have this kind of optional features. Either EMMA
>> must be there or it must not (either one is ok to me).
>>
>>
>> -Olli
>>
>>
>>
>>
>>
>>>
>>> On Wed, Jun 6, 2012 at 9:14 PM, Young, Milan <Milan.Young@nuance.com>
>>> wrote:
>>>
>>>> Since there are no objections, I suggest the following be added to the
>>>> spec:
>>>>
>>>>
>>>>
>>>> Section 5.1:
>>>>
>>>> readonly attribute Document emma;
>>>>
>>>>
>>>>
>>>> Section 5.1.6 needs
>>>>
>>>> emma - EMMA 1.0 (link to http://www.w3.org/TR/emma/) representation of
>>>> this result.  The contents of this result could vary across UAs and
>>>> recognition engines, but all implementations MUST at least expose the
>>>> following:
>>>>
>>>> *       Valid XML document complete with EMMA namespace
>>>>
>>>> *       <emma:interpretation> tag(s) populated with the interpretation
>>>> (e.g. emma:literal or slot values) and the following attributes: id,
>>>> emma:process, emma:tokens, emma:medium, emma:mode.
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> From: Young, Milan Sent: Wednesday, May 30, 2012 10:44 AM To: 'Deborah
>>>> Dahl'; 'Satish S' Cc: 'Bjorn Bringert'; 'Glen Shires'; 'Hans Wennborg';
>>>> public-speech-api@w3.org
>>>>
>>>>
>>>> Subject: RE: EMMA in Speech API (was RE: Speech API: first editor's
>>>> draft posted)
>>>>
>>>>
>>>>
>>>> Thanks Deborah, that's clear.  The upshot is that we don't need to
>>>> consider #3 as a use case for this specification.  But #1 and #4 still
>>>> apply.
>>>>
>>>>
>>>>
>>>> Any disagreements, or can I start drafting this for the spec?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> From: Deborah Dahl [mailto:dahl@conversational-**technologies.com<dahl@conversational-technologies.com>
>>>> ]
>>>>
>>>> Sent: Wednesday, May 30, 2012 10:10 AM To: Young, Milan; 'Satish S' Cc:
>>>> 'Bjorn Bringert'; 'Glen Shires'; 'Hans Wennborg';
>>>> public-speech-api@w3.org
>>>>
>>>> Subject: RE: EMMA in Speech API (was RE: Speech API: first editor's
>>>> draft posted)
>>>>
>>>>
>>>>
>>>> I agree that use case 3  (comparing grammars) would be most easily
>>>> achieved if the recognizer returned the emma:grammar information. However,
>>>> If I were implementing use case 3 without getting emma:grammar from the
>>>> recognizer , I think I would manually add the "emma:grammar" attribute
>>>> to the minimal EMMA provided by the UA (because I know the grammar that
>>>> I set for the recognizer). Then I would send the augmented EMMA off to
>>>> the logging/tuning server for later analysis. Even though there's a
>>>> manual step involved, it would be convenient to be able to add to existing
>>>> EMMA rather than to construct the whole EMMA manually.
>>>>
>>>>
>>>>
>>>> From: Young, Milan [mailto:Milan.Young@nuance.com**] Sent: Wednesday,
>>>> May 30, 2012 11:37 AM To: Satish S Cc: Bjorn Bringert; Deborah Dahl; Glen
>>>> Shires; Hans Wennborg; public-speech-api@w3.org Subject: RE: EMMA in
>>>> Speech API (was RE: Speech API: first editor's draft posted)
>>>>
>>>>
>>>>
>>>> I'm suggesting that if the UA doesn't integrate with a speech engine
>>>> that supports EMMA, that it must provide a wrapper so that basic
>>>> interoperability can be achieved.  In use case #1 (comparing speech
>>>> engines), that means injecting an <emma:process> tag that contains the name
>>>> of the underlying speech engine.
>>>>
>>>>
>>>>
>>>> I agree that use case #3 could not be achieved without a tight coupling
>>>> with the engine.  If Deborah is OK with dropping this, so am I.
>>>>
>>>>
>>>>
>>>> I don't understand your point about use case #4.  Earlier you were
>>>> arguing for a null/undefined value if the speech engine didn't natively
>>>> support EMMA.  Obviously this would prevent the suggested use case.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> From: Satish S [mailto:satish@google.com] Sent: Wednesday, May 30,
>>>> 2012 8:19 AM To: Young, Milan Cc: Bjorn Bringert; Deborah Dahl; Glen Shires;
>>>> Hans Wennborg; public-speech-api@w3.org Subject: Re: EMMA in Speech
>>>> API (was RE: Speech API: first editor's draft posted)
>>>>
>>>>
>>>>
>>>> Satish, please take a look at the use cases below.  Items #1 and #3
>>>> cannot be achieved unless EMMA is always present.
>>>>
>>>>
>>>>
>>>> To clarify, are you suggesting that speech recognizers must always
>>>> return EMMA to the UA, or are you suggesting if they don't the UA should
>>>> create a wrapper EMMA object with just the utterance(s) and give that
>>>> to the web page? If it is the latter then #1 and #3 can't be achieved
>>>> anyway because the UA doesn't have enough information to create an EMMA
>>>> wrapper with all possible data that the web app may want (specifically
>>>> it wouldn't know about what to put in the emma:process and emma:fields
>>>> given in those use cases). And if it is the former that seems out of
>>>> scope of this CG.
>>>>
>>>>
>>>>
>>>> I'd like to add another use case #4.  Application needs to post the
>>>> recognition result to server before proceeding in the dialog.  The server
>>>> might be a traditional application server or it could be the controller
>>>> in an MMI architecture.  EMMA is a standard serialized representation.
>>>>
>>>>
>>>>
>>>> If the server supports EMMA then my proposal should work because the
>>>> web app would be receiving the EMMA Document as is.
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Cheers
>>>>
>>>> Satish
>>>>
>>>
>>>
>>
>
Received on Thursday, 7 June 2012 23:32:28 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 7 June 2012 23:32:29 GMT