Re: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

I would prefer having an easy solution for the majority of apps which
just want the interpretation, which is either just a string or a JS
object (when using SISR). Boilerplate code sucks. Having EMMA
available sounds ok too, but that seems like a minority feature to me.

On Mon, May 21, 2012 at 4:28 PM, Satish S <satish@google.com> wrote:
> There exists a SpeechRecognitionAlternative.interpretation for the purpose
> of returning semantic interpretation. I propose we change it to be of type
> 'Document' (i.e. a DOM document which will contain the XML DOM) and mention
> that it contains the interpretation in EMMA format.
>
> Since EMMA is an XML format I think we should have just the above attribute
> and not add a text variant. If a web app needs the text representation it is
> trivial to get it from the DOM representation with many javascript
> libraries.
>
> Cheers
> Satish
>
>
>
> On Fri, May 18, 2012 at 1:02 AM, Young, Milan <Milan.Young@nuance.com>
> wrote:
>>
>> I second Deborah’s line of reasoning.  Our goal should be to minimize
>> browser-specific dependencies unless there is a clear reason to do
>> otherwise.  A few dozen lines of code in the browser is not a good reason.
>>
>>
>>
>>
>>
>> From: Deborah Dahl [mailto:dahl@conversational-technologies.com]
>> Sent: Thursday, May 17, 2012 1:59 PM
>> To: 'Glen Shires'; Young, Milan
>>
>>
>> Cc: 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S'
>> Subject: RE: EMMA in Speech API (was RE: Speech API: first editor's draft
>> posted)
>>
>>
>>
>> Yes, the developer could certainly do that, but the NULL check and
>> subsequent hand-building of EMMA introduces a service dependency into the
>> developer’s Javascript, which I think we want to minimize.  I think it would
>> be better if the speech recognizers that don’t support EMMA natively just
>> put a minimal EMMA wrapper around the token result for EMMAXML and EMMAText.
>>
>>
>>
>> From: Glen Shires [mailto:gshires@google.com]
>> Sent: Thursday, May 17, 2012 3:08 PM
>> To: Young, Milan
>> Cc: Deborah Dahl; Hans Wennborg; public-speech-api@w3.org; Satish S
>> Subject: Re: EMMA in Speech API (was RE: Speech API: first editor's draft
>> posted)
>>
>>
>>
>> Deborah,
>>
>> Since the JavaScript Speech API we're defining is new, I presume there
>> must be a new JavaScript "glue" layer between the Speech API and the
>> existing applications, dialog managers and log analysis tools that you
>> mention. [1]  Since the minimal EMMA wrapper you've defined is so simple, it
>> could easily be generated in that JavaScript "glue" layer.
>>
>>
>>
>> I propose that we define attributes for EMMAXML and EMMAText so that
>> recognizers that do support these do return them, and we make it acceptable
>> for user-agents to return NULL for these attributes for recognizers that
>> don't support EMMA.
>>
>>
>>
>> [1]
>> http://lists.w3.org/Archives/Public/public-speech-api/2012Apr/0014.html
>>
>>
>>
>> Thanks,
>>
>> Glen Shires
>>
>>
>>
>> On Tue, Apr 24, 2012 at 9:32 AM, Young, Milan <Milan.Young@nuance.com>
>> wrote:
>>
>> Thanks for the example Deborah.
>>
>>
>>
>> Being that it would be so simple for a recognition engine (or even a UA)
>> to add EMMA, are there any objections to exposing it?  We would of course
>> also provide plain text to support the simple use cases.
>>
>>
>>
>>
>>
>>
>>
>> From: Deborah Dahl [mailto:dahl@conversational-technologies.com]
>> Sent: Tuesday, April 24, 2012 8:52 AM
>> To: Young, Milan; 'Glen Shires'
>>
>>
>> Cc: 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S'
>>
>> Subject: EMMA in Speech API (was RE: Speech API: first editor's draft
>> posted)
>>
>>
>>
>> Hi Milan,
>>
>> Yes, I think that it wouldn’t be too difficult to wrap a token result with
>> a minimal  EMMA wrapper.
>>
>>
>>
>> I think the following would be the minimal EMMA required to represent just
>> the spoken tokens.
>>
>> Utterance: “flights from Boston to Denver”
>>
>>
>>
>> <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
>>
>>     <emma:interpretation id="int1"
>>
>>       emma:medium="acoustic"
>>
>>       emma:mode="voice">
>>
>>          <emma:literal>
>>
>>               flights from boston to denver
>>
>>          </emma:literal>
>>
>>    </emma: interpretation>
>>
>> </emma:emma>
>>
>>
>>
>> If you wanted to add confidence and a semantic interpretation, the EMMA
>> would be like this:
>>
>> <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
>>
>>     <emma:interpretation id="int1"
>>
>>       emma:confidence="0.75"
>>
>>      emma:tokens="flights from boston to denver"
>>
>>      emma:medium="acoustic"
>>
>>      emma:mode="voice">
>>
>>       <origin>Boston</origin>
>>
>>       <destination>Denver</destination>
>>
>>    </emma:interpretation>
>>
>> </emma:emma>
>>
>>
>>
>> Debbie
>>
>>
>>
>> From: Young, Milan [mailto:Milan.Young@nuance.com]
>> Sent: Monday, April 23, 2012 6:18 PM
>> To: Deborah Dahl; 'Glen Shires'
>> Cc: 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S'
>> Subject: RE: Speech API: first editor's draft posted
>>
>>
>>
>> Deborah, is there a form of an EMMA result that only encodes the spoken
>> phrase (omits interpretation etc)?  If so, perhaps vendors that do not
>> currently support EMMA could use this trivial wrapper to achieve compliance.
>>
>>
>>
>> Either way I agree that EMMA must be a part of the spec.  I didn’t notice
>> that this had been pulled in this most recent proposal or I would have
>> mentioned it myself.
>>
>>
>>
>> Thanks
>>
>>
>>
>> From: Deborah Dahl [mailto:dahl@conversational-technologies.com]
>> Sent: Monday, April 23, 2012 1:21 PM
>> To: 'Glen Shires'
>> Cc: 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S'
>> Subject: RE: Speech API: first editor's draft posted
>>
>>
>>
>> I think standardization will actually be accelerated by making EMMA part
>> of the specification. EMMA (and its predecessor NLSML) were in fact
>> originally partly motivated by the non-interoperable and proprietary ways
>> that different speech recognizers represented semantic interpretations. This
>> made it very difficult for an application to be used with different speech
>> recognizers.
>>
>> I don’t know what proportion of existing speech services do or don’t
>> support EMMA, but there are definitely speech services, as well multimodal
>> application platforms, that do. I know that there will be applications and
>> more generally, development platforms, that won’t be able to use this spec
>> unless they can get EMMA results.
>>
>> From: Glen Shires [mailto:gshires@google.com]
>> Sent: Monday, April 23, 2012 3:37 PM
>> To: Deborah Dahl
>> Cc: Hans Wennborg; public-speech-api@w3.org; Satish S
>> Subject: Re: Speech API: first editor's draft posted
>>
>>
>>
>> For this initial specification, we believe that a simplified API will
>> accelerate implementation, interoperability testing, standardization and
>> ultimately developer adoption.  Getting rapid adoption amongst many user
>> agents and many speech recognition services is a primary goal.
>>
>>
>>
>> Many speech recognition services currently do not support EMMA, and EMMA
>> is not required for the majority of use cases, therefore I believe EMMA is
>> something we should consider adding in a future iteration of this
>> specification.
>>
>>
>>
>> /Glen Shires
>>
>>
>>
>>
>>
>> On Mon, Apr 23, 2012 at 11:44 AM, Deborah Dahl
>> <dahl@conversational-technologies.com> wrote:
>>
>> Thanks for preparing this draft.
>> I'd like to advocate including the EMMAText and EMMAXML attributes in
>> SpeechRecognitionResult. One argument is that at least some existing
>> consumers of speech recognition results (for example, dialog managers and
>> log analysis tools) currently expect EMMA as input. It would be very
>> desirable not to have to modify them to process multiple different
>> recognizer result formats. A web developer who's new to speech recognition
>> can ignore the EMMA if they want, because if all they want is tokens,
>> confidence, or semantics, those are available from the
>> SpeechRecognitionAlternative objects.
>>
>>
>> > -----Original Message-----
>> > From: Hans Wennborg [mailto:hwennborg@google.com]
>>
>> > Sent: Thursday, April 12, 2012 10:36 AM
>> > To: public-speech-api@w3.org
>> > Cc: Satish S; Glen Shires
>> > Subject: Speech API: first editor's draft posted
>> >
>>
>> > In December, Google proposed [1] to public-webapps a Speech JavaScript
>> > API that subset supports the majority of the use-cases in the Speech
>> > Incubator Group's Final Report. This proposal provides a programmatic
>> > API that enables web-pages to synthesize speech output and to use
>> > speech recognition as an input for forms, continuous dictation and
>> > control.
>> >
>> > We have now posted in the Speech-API Community Group's repository, a
>> > slightly updated proposal [2], the differences include:
>> >
>> >  - Document is now self-contained, rather than having multiple
>> > references to the XG Final Report.
>> >  - Renamed SpeechReco interface to SpeechRecognition
>> >  - Renamed interfaces and attributes beginning SpeechInput* to
>> > SpeechRecognition*
>> >  - Moved EventTarget to constructor of SpeechRecognition
>> >  - Clarified that grammars and lang are attributes of SpeechRecognition
>> >  - Clarified that if index is greater than or equal to length, returns
>> null
>> >
>> > We welcome discussion and feedback on this editor's draft. Please send
>> > your comments to the public-speech-api@w3.org mailing list.
>> >
>> > Glen Shires
>> > Hans Wennborg
>> >
>> > [1] http://lists.w3.org/Archives/Public/public-
>> > webapps/2011OctDec/1696.html
>> > [2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
>>
>>
>>
>>
>>
>> --
>> Thanks!
>>
>> Glen Shires
>>
>>
>>
>>
>>
>>
>
>



-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902

Received on Monday, 21 May 2012 15:56:26 UTC