- From: Satish S <satish@google.com>
- Date: Mon, 21 May 2012 16:28:16 +0100
- To: "Young, Milan" <Milan.Young@nuance.com>
- Cc: Deborah Dahl <dahl@conversational-technologies.com>, Glen Shires <gshires@google.com>, Hans Wennborg <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
- Message-ID: <CAHZf7Rnaq8nC5GOiAgACfK41hoHhNeFWmyEp6YASg-kqtEpTgg@mail.gmail.com>
There exists a SpeechRecognitionAlternative.interpretation<http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#dfn-interpretation> for the purpose of returning semantic interpretation. I propose we change it to be of type 'Document' (i.e. a DOM document which will contain the XML DOM) and mention that it contains the interpretation in EMMA format. Since EMMA is an XML format I think we should have just the above attribute and not add a text variant. If a web app needs the text representation it is trivial to get it from the DOM representation with many javascript libraries. Cheers Satish On Fri, May 18, 2012 at 1:02 AM, Young, Milan <Milan.Young@nuance.com>wrote: > I second Deborah’s line of reasoning. Our goal should be to minimize > browser-specific dependencies unless there is a clear reason to do > otherwise. A few dozen lines of code in the browser is not a good reason. > **** > > ** ** > > ** ** > > *From:* Deborah Dahl [mailto:dahl@conversational-technologies.com] > *Sent:* Thursday, May 17, 2012 1:59 PM > *To:* 'Glen Shires'; Young, Milan > > *Cc:* 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S' > *Subject:* RE: EMMA in Speech API (was RE: Speech API: first editor's > draft posted)**** > > ** ** > > Yes, the developer could certainly do that, but the NULL check and > subsequent hand-building of EMMA introduces a service dependency into the > developer’s Javascript, which I think we want to minimize. I think it > would be better if the speech recognizers that don’t support EMMA natively > just put a minimal EMMA wrapper around the token result for EMMAXML and > EMMAText.**** > > ** ** > > *From:* Glen Shires [mailto:gshires@google.com] > *Sent:* Thursday, May 17, 2012 3:08 PM > *To:* Young, Milan > *Cc:* Deborah Dahl; Hans Wennborg; public-speech-api@w3.org; Satish S > *Subject:* Re: EMMA in Speech API (was RE: Speech API: first editor's > draft posted)**** > > ** ** > > Deborah,**** > > Since the JavaScript Speech API we're defining is new, I presume there > must be a new JavaScript "glue" layer between the Speech API and the > existing applications, dialog managers and log analysis tools that you > mention. [1] Since the minimal EMMA wrapper you've defined is so simple, > it could easily be generated in that JavaScript "glue" layer.**** > > ** ** > > I propose that we define attributes for EMMAXML and EMMAText so that > recognizers that do support these do return them, and we make it acceptable > for user-agents to return NULL for these attributes for recognizers that > don't support EMMA.**** > > ** ** > > [1] > http://lists.w3.org/Archives/Public/public-speech-api/2012Apr/0014.html*** > * > > ** ** > > Thanks,**** > > Glen Shires**** > > ** ** > > On Tue, Apr 24, 2012 at 9:32 AM, Young, Milan <Milan.Young@nuance.com> > wrote:**** > > Thanks for the example Deborah.**** > > **** > > Being that it would be so simple for a recognition engine (or even a UA) > to add EMMA, are there any objections to exposing it? We would of course > also provide plain text to support the simple use cases.**** > > **** > > **** > > **** > > *From:* Deborah Dahl [mailto:dahl@conversational-technologies.com] > *Sent:* Tuesday, April 24, 2012 8:52 AM > *To:* Young, Milan; 'Glen Shires'**** > > > *Cc:* 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S'**** > > *Subject:* EMMA in Speech API (was RE: Speech API: first editor's draft > posted)**** > > **** > > Hi Milan,**** > > Yes, I think that it wouldn’t be too difficult to wrap a token result with > a minimal EMMA wrapper.**** > > **** > > I think the following would be the minimal EMMA required to represent just > the spoken tokens. **** > > Utterance: “flights from Boston to Denver”**** > > **** > > <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">**** > > <emma:interpretation id="int1" **** > > emma:medium="acoustic" **** > > emma:mode="voice">**** > > <emma:literal>**** > > flights from boston to denver**** > > </emma:literal>**** > > </emma: interpretation>**** > > </emma:emma>**** > > **** > > If you wanted to add confidence and a semantic interpretation, the EMMA > would be like this:**** > > <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">**** > > <emma:interpretation id="int1" **** > > emma:confidence="0.75"**** > > emma:tokens="flights from boston to denver" **** > > emma:medium="acoustic" **** > > emma:mode="voice">**** > > <origin>Boston</origin>**** > > <destination>Denver</destination>**** > > </emma:interpretation>**** > > </emma:emma>**** > > **** > > Debbie**** > > **** > > *From:* Young, Milan [mailto:Milan.Young@nuance.com] > *Sent:* Monday, April 23, 2012 6:18 PM > *To:* Deborah Dahl; 'Glen Shires' > *Cc:* 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S' > *Subject:* RE: Speech API: first editor's draft posted**** > > **** > > Deborah, is there a form of an EMMA result that only encodes the spoken > phrase (omits interpretation etc)? If so, perhaps vendors that do not > currently support EMMA could use this trivial wrapper to achieve compliance. > **** > > **** > > Either way I agree that EMMA must be a part of the spec. I didn’t notice > that this had been pulled in this most recent proposal or I would have > mentioned it myself.**** > > **** > > Thanks**** > > **** > > *From:* Deborah Dahl [mailto:dahl@conversational-technologies.com] > *Sent:* Monday, April 23, 2012 1:21 PM > *To:* 'Glen Shires' > *Cc:* 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S' > *Subject:* RE: Speech API: first editor's draft posted**** > > **** > > I think standardization will actually be accelerated by making EMMA part > of the specification. EMMA (and its predecessor NLSML) were in fact > originally partly motivated by the non-interoperable and proprietary ways > that different speech recognizers represented semantic interpretations. > This made it very difficult for an application to be used with different > speech recognizers.**** > > I don’t know what proportion of existing speech services do or don’t > support EMMA, but there are definitely speech services, as well multimodal > application platforms, that do. I know that there will be applications and > more generally, development platforms, that won’t be able to use this spec > unless they can get EMMA results. **** > > *From:* Glen Shires [mailto:gshires@google.com] > *Sent:* Monday, April 23, 2012 3:37 PM > *To:* Deborah Dahl > *Cc:* Hans Wennborg; public-speech-api@w3.org; Satish S > *Subject:* Re: Speech API: first editor's draft posted**** > > **** > > For this initial specification, we believe that a simplified API will > accelerate implementation, interoperability testing, standardization and > ultimately developer adoption. Getting rapid adoption amongst many user > agents and many speech recognition services is a primary goal. **** > > **** > > Many speech recognition services currently do not support EMMA, and EMMA > is not required for the majority of use cases, therefore I believe EMMA is > something we should consider adding in a future iteration of this > specification.**** > > **** > > /Glen Shires**** > > **** > > **** > > On Mon, Apr 23, 2012 at 11:44 AM, Deborah Dahl < > dahl@conversational-technologies.com> wrote:**** > > Thanks for preparing this draft. > I'd like to advocate including the EMMAText and EMMAXML attributes in > SpeechRecognitionResult. One argument is that at least some existing > consumers of speech recognition results (for example, dialog managers and > log analysis tools) currently expect EMMA as input. It would be very > desirable not to have to modify them to process multiple different > recognizer result formats. A web developer who's new to speech recognition > can ignore the EMMA if they want, because if all they want is tokens, > confidence, or semantics, those are available from the > SpeechRecognitionAlternative objects.**** > > > > -----Original Message----- > > From: Hans Wennborg [mailto:hwennborg@google.com]**** > > > Sent: Thursday, April 12, 2012 10:36 AM > > To: public-speech-api@w3.org > > Cc: Satish S; Glen Shires > > Subject: Speech API: first editor's draft posted > >**** > > > In December, Google proposed [1] to public-webapps a Speech JavaScript > > API that subset supports the majority of the use-cases in the Speech > > Incubator Group's Final Report. This proposal provides a programmatic > > API that enables web-pages to synthesize speech output and to use > > speech recognition as an input for forms, continuous dictation and > > control. > > > > We have now posted in the Speech-API Community Group's repository, a > > slightly updated proposal [2], the differences include: > > > > - Document is now self-contained, rather than having multiple > > references to the XG Final Report. > > - Renamed SpeechReco interface to SpeechRecognition > > - Renamed interfaces and attributes beginning SpeechInput* to > > SpeechRecognition* > > - Moved EventTarget to constructor of SpeechRecognition > > - Clarified that grammars and lang are attributes of SpeechRecognition > > - Clarified that if index is greater than or equal to length, returns > null > > > > We welcome discussion and feedback on this editor's draft. Please send > > your comments to the public-speech-api@w3.org mailing list. > > > > Glen Shires > > Hans Wennborg > > > > [1] http://lists.w3.org/Archives/Public/public- > > webapps/2011OctDec/1696.html > > [2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html**** > > > > **** > > **** > > -- > Thanks!**** > > Glen Shires**** > > **** > > ** ** > > ** ** >
Received on Monday, 21 May 2012 15:29:08 UTC