- From: Bjorn Bringert <bringert@google.com>
- Date: Mon, 21 May 2012 16:50:46 +0100
- To: Satish S <satish@google.com>
- Cc: "Young, Milan" <Milan.Young@nuance.com>, Deborah Dahl <dahl@conversational-technologies.com>, Glen Shires <gshires@google.com>, Hans Wennborg <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
I would prefer having an easy solution for the majority of apps which just want the interpretation, which is either just a string or a JS object (when using SISR). Boilerplate code sucks. Having EMMA available sounds ok too, but that seems like a minority feature to me. On Mon, May 21, 2012 at 4:28 PM, Satish S <satish@google.com> wrote: > There exists a SpeechRecognitionAlternative.interpretation for the purpose > of returning semantic interpretation. I propose we change it to be of type > 'Document' (i.e. a DOM document which will contain the XML DOM) and mention > that it contains the interpretation in EMMA format. > > Since EMMA is an XML format I think we should have just the above attribute > and not add a text variant. If a web app needs the text representation it is > trivial to get it from the DOM representation with many javascript > libraries. > > Cheers > Satish > > > > On Fri, May 18, 2012 at 1:02 AM, Young, Milan <Milan.Young@nuance.com> > wrote: >> >> I second Deborah’s line of reasoning. Our goal should be to minimize >> browser-specific dependencies unless there is a clear reason to do >> otherwise. A few dozen lines of code in the browser is not a good reason. >> >> >> >> >> >> From: Deborah Dahl [mailto:dahl@conversational-technologies.com] >> Sent: Thursday, May 17, 2012 1:59 PM >> To: 'Glen Shires'; Young, Milan >> >> >> Cc: 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S' >> Subject: RE: EMMA in Speech API (was RE: Speech API: first editor's draft >> posted) >> >> >> >> Yes, the developer could certainly do that, but the NULL check and >> subsequent hand-building of EMMA introduces a service dependency into the >> developer’s Javascript, which I think we want to minimize. I think it would >> be better if the speech recognizers that don’t support EMMA natively just >> put a minimal EMMA wrapper around the token result for EMMAXML and EMMAText. >> >> >> >> From: Glen Shires [mailto:gshires@google.com] >> Sent: Thursday, May 17, 2012 3:08 PM >> To: Young, Milan >> Cc: Deborah Dahl; Hans Wennborg; public-speech-api@w3.org; Satish S >> Subject: Re: EMMA in Speech API (was RE: Speech API: first editor's draft >> posted) >> >> >> >> Deborah, >> >> Since the JavaScript Speech API we're defining is new, I presume there >> must be a new JavaScript "glue" layer between the Speech API and the >> existing applications, dialog managers and log analysis tools that you >> mention. [1] Since the minimal EMMA wrapper you've defined is so simple, it >> could easily be generated in that JavaScript "glue" layer. >> >> >> >> I propose that we define attributes for EMMAXML and EMMAText so that >> recognizers that do support these do return them, and we make it acceptable >> for user-agents to return NULL for these attributes for recognizers that >> don't support EMMA. >> >> >> >> [1] >> http://lists.w3.org/Archives/Public/public-speech-api/2012Apr/0014.html >> >> >> >> Thanks, >> >> Glen Shires >> >> >> >> On Tue, Apr 24, 2012 at 9:32 AM, Young, Milan <Milan.Young@nuance.com> >> wrote: >> >> Thanks for the example Deborah. >> >> >> >> Being that it would be so simple for a recognition engine (or even a UA) >> to add EMMA, are there any objections to exposing it? We would of course >> also provide plain text to support the simple use cases. >> >> >> >> >> >> >> >> From: Deborah Dahl [mailto:dahl@conversational-technologies.com] >> Sent: Tuesday, April 24, 2012 8:52 AM >> To: Young, Milan; 'Glen Shires' >> >> >> Cc: 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S' >> >> Subject: EMMA in Speech API (was RE: Speech API: first editor's draft >> posted) >> >> >> >> Hi Milan, >> >> Yes, I think that it wouldn’t be too difficult to wrap a token result with >> a minimal EMMA wrapper. >> >> >> >> I think the following would be the minimal EMMA required to represent just >> the spoken tokens. >> >> Utterance: “flights from Boston to Denver” >> >> >> >> <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"> >> >> <emma:interpretation id="int1" >> >> emma:medium="acoustic" >> >> emma:mode="voice"> >> >> <emma:literal> >> >> flights from boston to denver >> >> </emma:literal> >> >> </emma: interpretation> >> >> </emma:emma> >> >> >> >> If you wanted to add confidence and a semantic interpretation, the EMMA >> would be like this: >> >> <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"> >> >> <emma:interpretation id="int1" >> >> emma:confidence="0.75" >> >> emma:tokens="flights from boston to denver" >> >> emma:medium="acoustic" >> >> emma:mode="voice"> >> >> <origin>Boston</origin> >> >> <destination>Denver</destination> >> >> </emma:interpretation> >> >> </emma:emma> >> >> >> >> Debbie >> >> >> >> From: Young, Milan [mailto:Milan.Young@nuance.com] >> Sent: Monday, April 23, 2012 6:18 PM >> To: Deborah Dahl; 'Glen Shires' >> Cc: 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S' >> Subject: RE: Speech API: first editor's draft posted >> >> >> >> Deborah, is there a form of an EMMA result that only encodes the spoken >> phrase (omits interpretation etc)? If so, perhaps vendors that do not >> currently support EMMA could use this trivial wrapper to achieve compliance. >> >> >> >> Either way I agree that EMMA must be a part of the spec. I didn’t notice >> that this had been pulled in this most recent proposal or I would have >> mentioned it myself. >> >> >> >> Thanks >> >> >> >> From: Deborah Dahl [mailto:dahl@conversational-technologies.com] >> Sent: Monday, April 23, 2012 1:21 PM >> To: 'Glen Shires' >> Cc: 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S' >> Subject: RE: Speech API: first editor's draft posted >> >> >> >> I think standardization will actually be accelerated by making EMMA part >> of the specification. EMMA (and its predecessor NLSML) were in fact >> originally partly motivated by the non-interoperable and proprietary ways >> that different speech recognizers represented semantic interpretations. This >> made it very difficult for an application to be used with different speech >> recognizers. >> >> I don’t know what proportion of existing speech services do or don’t >> support EMMA, but there are definitely speech services, as well multimodal >> application platforms, that do. I know that there will be applications and >> more generally, development platforms, that won’t be able to use this spec >> unless they can get EMMA results. >> >> From: Glen Shires [mailto:gshires@google.com] >> Sent: Monday, April 23, 2012 3:37 PM >> To: Deborah Dahl >> Cc: Hans Wennborg; public-speech-api@w3.org; Satish S >> Subject: Re: Speech API: first editor's draft posted >> >> >> >> For this initial specification, we believe that a simplified API will >> accelerate implementation, interoperability testing, standardization and >> ultimately developer adoption. Getting rapid adoption amongst many user >> agents and many speech recognition services is a primary goal. >> >> >> >> Many speech recognition services currently do not support EMMA, and EMMA >> is not required for the majority of use cases, therefore I believe EMMA is >> something we should consider adding in a future iteration of this >> specification. >> >> >> >> /Glen Shires >> >> >> >> >> >> On Mon, Apr 23, 2012 at 11:44 AM, Deborah Dahl >> <dahl@conversational-technologies.com> wrote: >> >> Thanks for preparing this draft. >> I'd like to advocate including the EMMAText and EMMAXML attributes in >> SpeechRecognitionResult. One argument is that at least some existing >> consumers of speech recognition results (for example, dialog managers and >> log analysis tools) currently expect EMMA as input. It would be very >> desirable not to have to modify them to process multiple different >> recognizer result formats. A web developer who's new to speech recognition >> can ignore the EMMA if they want, because if all they want is tokens, >> confidence, or semantics, those are available from the >> SpeechRecognitionAlternative objects. >> >> >> > -----Original Message----- >> > From: Hans Wennborg [mailto:hwennborg@google.com] >> >> > Sent: Thursday, April 12, 2012 10:36 AM >> > To: public-speech-api@w3.org >> > Cc: Satish S; Glen Shires >> > Subject: Speech API: first editor's draft posted >> > >> >> > In December, Google proposed [1] to public-webapps a Speech JavaScript >> > API that subset supports the majority of the use-cases in the Speech >> > Incubator Group's Final Report. This proposal provides a programmatic >> > API that enables web-pages to synthesize speech output and to use >> > speech recognition as an input for forms, continuous dictation and >> > control. >> > >> > We have now posted in the Speech-API Community Group's repository, a >> > slightly updated proposal [2], the differences include: >> > >> > - Document is now self-contained, rather than having multiple >> > references to the XG Final Report. >> > - Renamed SpeechReco interface to SpeechRecognition >> > - Renamed interfaces and attributes beginning SpeechInput* to >> > SpeechRecognition* >> > - Moved EventTarget to constructor of SpeechRecognition >> > - Clarified that grammars and lang are attributes of SpeechRecognition >> > - Clarified that if index is greater than or equal to length, returns >> null >> > >> > We welcome discussion and feedback on this editor's draft. Please send >> > your comments to the public-speech-api@w3.org mailing list. >> > >> > Glen Shires >> > Hans Wennborg >> > >> > [1] http://lists.w3.org/Archives/Public/public- >> > webapps/2011OctDec/1696.html >> > [2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html >> >> >> >> >> >> -- >> Thanks! >> >> Glen Shires >> >> >> >> >> >> > > -- Bjorn Bringert Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Monday, 21 May 2012 15:56:26 UTC