Re: EMMA in Speech API (was RE: Speech API: first editor's draft posted) from Glen Shires on 2012-05-17 (public-speech-api@w3.org from May 2012)

From: Glen Shires <gshires@google.com>
Date: Thu, 17 May 2012 12:08:22 -0700
To: "Young, Milan" <Milan.Young@nuance.com>
Cc: Deborah Dahl <dahl@conversational-technologies.com>, Hans Wennborg <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>, Satish S <satish@google.com>
Message-ID: <CAEE5bcgzgDz1wuKnJ92iv9pMqxsAROqT6A_qmSXE+An5W_7irg@mail.gmail.com>
Deborah,
Since the JavaScript Speech API we're defining is new, I presume there must
be a new JavaScript "glue" layer between the Speech API and the existing
applications, dialog managers and log analysis tools that you mention. [1]
 Since the minimal EMMA wrapper you've defined is so simple, it could
easily be generated in that JavaScript "glue" layer.

I propose that we define attributes for EMMAXML and EMMAText so that
recognizers that do support these do return them, and we make it acceptable
for user-agents to return NULL for these attributes for recognizers that
don't support EMMA.

[1] http://lists.w3.org/Archives/Public/public-speech-api/2012Apr/0014.html

Thanks,
Glen Shires

On Tue, Apr 24, 2012 at 9:32 AM, Young, Milan <Milan.Young@nuance.com>wrote:

>  Thanks for the example Deborah.****
>
> ** **
>
> Being that it would be so simple for a recognition engine (or even a UA)
> to add EMMA, are there any objections to exposing it?  We would of course
> also provide plain text to support the simple use cases.****
>
> ** **
>
> ** **
>
> ** **
>
> *From:* Deborah Dahl [mailto:dahl@conversational-technologies.com]
> *Sent:* Tuesday, April 24, 2012 8:52 AM
> *To:* Young, Milan; 'Glen Shires'
>
> *Cc:* 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S'
> *Subject:* EMMA in Speech API (was RE: Speech API: first editor's draft
> posted)****
>
>  ** **
>
> Hi Milan,****
>
> Yes, I think that it wouldn’t be too difficult to wrap a token result with
> a minimal  EMMA wrapper.****
>
> ** **
>
> I think the following would be the minimal EMMA required to represent just
> the spoken tokens. ****
>
> Utterance: “flights from Boston to Denver”****
>
> ** **
>
> <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">****
>
>     <emma:interpretation id="int1" ****
>
>       emma:medium="acoustic" ****
>
>       emma:mode="voice">****
>
>          <emma:literal>****
>
>               flights from boston to denver****
>
>          </emma:literal>****
>
>    </emma: interpretation>****
>
> </emma:emma>****
>
> ** **
>
> If you wanted to add confidence and a semantic interpretation, the EMMA
> would be like this:****
>
> <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">****
>
>     <emma:interpretation id="int1" ****
>
>       emma:confidence="0.75"****
>
>      emma:tokens="flights from boston to denver" ****
>
>      emma:medium="acoustic" ****
>
>      emma:mode="voice">****
>
>       <origin>Boston</origin>****
>
>       <destination>Denver</destination>****
>
>    </emma:interpretation>****
>
> </emma:emma>****
>
> ** **
>
> Debbie****
>
> ** **
>
> *From:* Young, Milan [mailto:Milan.Young@nuance.com]
> *Sent:* Monday, April 23, 2012 6:18 PM
> *To:* Deborah Dahl; 'Glen Shires'
> *Cc:* 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S'
> *Subject:* RE: Speech API: first editor's draft posted****
>
> ** **
>
> Deborah, is there a form of an EMMA result that only encodes the spoken
> phrase (omits interpretation etc)?  If so, perhaps vendors that do not
> currently support EMMA could use this trivial wrapper to achieve compliance.
> ****
>
> ** **
>
> Either way I agree that EMMA must be a part of the spec.  I didn’t notice
> that this had been pulled in this most recent proposal or I would have
> mentioned it myself.****
>
> ** **
>
> Thanks****
>
> ** **
>
> *From:* Deborah Dahl [mailto:dahl@conversational-technologies.com]
> *Sent:* Monday, April 23, 2012 1:21 PM
> *To:* 'Glen Shires'
> *Cc:* 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S'
> *Subject:* RE: Speech API: first editor's draft posted****
>
> ** **
>
> I think standardization will actually be accelerated by making EMMA part
> of the specification. EMMA (and its predecessor NLSML) were in fact
> originally partly motivated by the non-interoperable and proprietary ways
> that different speech recognizers represented semantic interpretations.
> This made it very difficult for an application to be used with different
> speech recognizers.****
>
> I don’t know what proportion of existing speech services do or don’t
> support EMMA, but there are definitely speech services, as well multimodal
> application platforms, that do. I know that there will be applications and
> more generally, development platforms, that won’t be able to use this spec
> unless they can get EMMA results. ****
>
> *From:* Glen Shires [mailto:gshires@google.com]
> *Sent:* Monday, April 23, 2012 3:37 PM
> *To:* Deborah Dahl
> *Cc:* Hans Wennborg; public-speech-api@w3.org; Satish S
> *Subject:* Re: Speech API: first editor's draft posted****
>
> ** **
>
> For this initial specification, we believe that a simplified API will
> accelerate implementation, interoperability testing, standardization and
> ultimately developer adoption.  Getting rapid adoption amongst many user
> agents and many speech recognition services is a primary goal. ****
>
> ** **
>
> Many speech recognition services currently do not support EMMA, and EMMA
> is not required for the majority of use cases, therefore I believe EMMA is
> something we should consider adding in a future iteration of this
> specification.****
>
> ** **
>
> /Glen Shires****
>
> ** **
>
> ** **
>
> On Mon, Apr 23, 2012 at 11:44 AM, Deborah Dahl <
> dahl@conversational-technologies.com> wrote:****
>
> Thanks for preparing this draft.
> I'd like to advocate including the EMMAText and EMMAXML attributes in
> SpeechRecognitionResult. One argument is that at least some existing
> consumers of speech recognition results (for example, dialog managers and
> log analysis tools) currently expect EMMA as input. It would be very
> desirable not to have to modify them to process multiple different
> recognizer result formats. A web developer who's new to speech recognition
> can ignore the EMMA if they want, because if all they want is tokens,
> confidence, or semantics, those are available from the
> SpeechRecognitionAlternative objects.****
>
>
> > -----Original Message-----
> > From: Hans Wennborg [mailto:hwennborg@google.com]****
>
> > Sent: Thursday, April 12, 2012 10:36 AM
> > To: public-speech-api@w3.org
> > Cc: Satish S; Glen Shires
> > Subject: Speech API: first editor's draft posted
> >****
>
> > In December, Google proposed [1] to public-webapps a Speech JavaScript
> > API that subset supports the majority of the use-cases in the Speech
> > Incubator Group's Final Report. This proposal provides a programmatic
> > API that enables web-pages to synthesize speech output and to use
> > speech recognition as an input for forms, continuous dictation and
> > control.
> >
> > We have now posted in the Speech-API Community Group's repository, a
> > slightly updated proposal [2], the differences include:
> >
> >  - Document is now self-contained, rather than having multiple
> > references to the XG Final Report.
> >  - Renamed SpeechReco interface to SpeechRecognition
> >  - Renamed interfaces and attributes beginning SpeechInput* to
> > SpeechRecognition*
> >  - Moved EventTarget to constructor of SpeechRecognition
> >  - Clarified that grammars and lang are attributes of SpeechRecognition
> >  - Clarified that if index is greater than or equal to length, returns
> null
> >
> > We welcome discussion and feedback on this editor's draft. Please send
> > your comments to the public-speech-api@w3.org mailing list.
> >
> > Glen Shires
> > Hans Wennborg
> >
> > [1] http://lists.w3.org/Archives/Public/public-
> > webapps/2011OctDec/1696.html
> > [2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html****
>
>
>
> ****
>
> ** **
>
> --
> Thanks!****
>
> Glen Shires****
>
> ** **
>
Received on Thursday, 17 May 2012 19:09:34 UTC