- From: Glen Shires <gshires@google.com>
- Date: Thu, 17 May 2012 12:08:22 -0700
- To: "Young, Milan" <Milan.Young@nuance.com>
- Cc: Deborah Dahl <dahl@conversational-technologies.com>, Hans Wennborg <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>, Satish S <satish@google.com>
- Message-ID: <CAEE5bcgzgDz1wuKnJ92iv9pMqxsAROqT6A_qmSXE+An5W_7irg@mail.gmail.com>
Deborah, Since the JavaScript Speech API we're defining is new, I presume there must be a new JavaScript "glue" layer between the Speech API and the existing applications, dialog managers and log analysis tools that you mention. [1] Since the minimal EMMA wrapper you've defined is so simple, it could easily be generated in that JavaScript "glue" layer. I propose that we define attributes for EMMAXML and EMMAText so that recognizers that do support these do return them, and we make it acceptable for user-agents to return NULL for these attributes for recognizers that don't support EMMA. [1] http://lists.w3.org/Archives/Public/public-speech-api/2012Apr/0014.html Thanks, Glen Shires On Tue, Apr 24, 2012 at 9:32 AM, Young, Milan <Milan.Young@nuance.com>wrote: > Thanks for the example Deborah.**** > > ** ** > > Being that it would be so simple for a recognition engine (or even a UA) > to add EMMA, are there any objections to exposing it? We would of course > also provide plain text to support the simple use cases.**** > > ** ** > > ** ** > > ** ** > > *From:* Deborah Dahl [mailto:dahl@conversational-technologies.com] > *Sent:* Tuesday, April 24, 2012 8:52 AM > *To:* Young, Milan; 'Glen Shires' > > *Cc:* 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S' > *Subject:* EMMA in Speech API (was RE: Speech API: first editor's draft > posted)**** > > ** ** > > Hi Milan,**** > > Yes, I think that it wouldn’t be too difficult to wrap a token result with > a minimal EMMA wrapper.**** > > ** ** > > I think the following would be the minimal EMMA required to represent just > the spoken tokens. **** > > Utterance: “flights from Boston to Denver”**** > > ** ** > > <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">**** > > <emma:interpretation id="int1" **** > > emma:medium="acoustic" **** > > emma:mode="voice">**** > > <emma:literal>**** > > flights from boston to denver**** > > </emma:literal>**** > > </emma: interpretation>**** > > </emma:emma>**** > > ** ** > > If you wanted to add confidence and a semantic interpretation, the EMMA > would be like this:**** > > <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">**** > > <emma:interpretation id="int1" **** > > emma:confidence="0.75"**** > > emma:tokens="flights from boston to denver" **** > > emma:medium="acoustic" **** > > emma:mode="voice">**** > > <origin>Boston</origin>**** > > <destination>Denver</destination>**** > > </emma:interpretation>**** > > </emma:emma>**** > > ** ** > > Debbie**** > > ** ** > > *From:* Young, Milan [mailto:Milan.Young@nuance.com] > *Sent:* Monday, April 23, 2012 6:18 PM > *To:* Deborah Dahl; 'Glen Shires' > *Cc:* 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S' > *Subject:* RE: Speech API: first editor's draft posted**** > > ** ** > > Deborah, is there a form of an EMMA result that only encodes the spoken > phrase (omits interpretation etc)? If so, perhaps vendors that do not > currently support EMMA could use this trivial wrapper to achieve compliance. > **** > > ** ** > > Either way I agree that EMMA must be a part of the spec. I didn’t notice > that this had been pulled in this most recent proposal or I would have > mentioned it myself.**** > > ** ** > > Thanks**** > > ** ** > > *From:* Deborah Dahl [mailto:dahl@conversational-technologies.com] > *Sent:* Monday, April 23, 2012 1:21 PM > *To:* 'Glen Shires' > *Cc:* 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S' > *Subject:* RE: Speech API: first editor's draft posted**** > > ** ** > > I think standardization will actually be accelerated by making EMMA part > of the specification. EMMA (and its predecessor NLSML) were in fact > originally partly motivated by the non-interoperable and proprietary ways > that different speech recognizers represented semantic interpretations. > This made it very difficult for an application to be used with different > speech recognizers.**** > > I don’t know what proportion of existing speech services do or don’t > support EMMA, but there are definitely speech services, as well multimodal > application platforms, that do. I know that there will be applications and > more generally, development platforms, that won’t be able to use this spec > unless they can get EMMA results. **** > > *From:* Glen Shires [mailto:gshires@google.com] > *Sent:* Monday, April 23, 2012 3:37 PM > *To:* Deborah Dahl > *Cc:* Hans Wennborg; public-speech-api@w3.org; Satish S > *Subject:* Re: Speech API: first editor's draft posted**** > > ** ** > > For this initial specification, we believe that a simplified API will > accelerate implementation, interoperability testing, standardization and > ultimately developer adoption. Getting rapid adoption amongst many user > agents and many speech recognition services is a primary goal. **** > > ** ** > > Many speech recognition services currently do not support EMMA, and EMMA > is not required for the majority of use cases, therefore I believe EMMA is > something we should consider adding in a future iteration of this > specification.**** > > ** ** > > /Glen Shires**** > > ** ** > > ** ** > > On Mon, Apr 23, 2012 at 11:44 AM, Deborah Dahl < > dahl@conversational-technologies.com> wrote:**** > > Thanks for preparing this draft. > I'd like to advocate including the EMMAText and EMMAXML attributes in > SpeechRecognitionResult. One argument is that at least some existing > consumers of speech recognition results (for example, dialog managers and > log analysis tools) currently expect EMMA as input. It would be very > desirable not to have to modify them to process multiple different > recognizer result formats. A web developer who's new to speech recognition > can ignore the EMMA if they want, because if all they want is tokens, > confidence, or semantics, those are available from the > SpeechRecognitionAlternative objects.**** > > > > -----Original Message----- > > From: Hans Wennborg [mailto:hwennborg@google.com]**** > > > Sent: Thursday, April 12, 2012 10:36 AM > > To: public-speech-api@w3.org > > Cc: Satish S; Glen Shires > > Subject: Speech API: first editor's draft posted > >**** > > > In December, Google proposed [1] to public-webapps a Speech JavaScript > > API that subset supports the majority of the use-cases in the Speech > > Incubator Group's Final Report. This proposal provides a programmatic > > API that enables web-pages to synthesize speech output and to use > > speech recognition as an input for forms, continuous dictation and > > control. > > > > We have now posted in the Speech-API Community Group's repository, a > > slightly updated proposal [2], the differences include: > > > > - Document is now self-contained, rather than having multiple > > references to the XG Final Report. > > - Renamed SpeechReco interface to SpeechRecognition > > - Renamed interfaces and attributes beginning SpeechInput* to > > SpeechRecognition* > > - Moved EventTarget to constructor of SpeechRecognition > > - Clarified that grammars and lang are attributes of SpeechRecognition > > - Clarified that if index is greater than or equal to length, returns > null > > > > We welcome discussion and feedback on this editor's draft. Please send > > your comments to the public-speech-api@w3.org mailing list. > > > > Glen Shires > > Hans Wennborg > > > > [1] http://lists.w3.org/Archives/Public/public- > > webapps/2011OctDec/1696.html > > [2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html**** > > > > **** > > ** ** > > -- > Thanks!**** > > Glen Shires**** > > ** ** >
Received on Thursday, 17 May 2012 19:09:34 UTC