Re: EMMA in Speech API (was RE: Speech API: first editor's draft posted) from Satish S on 2012-05-21 (public-speech-api@w3.org from May 2012)

From: Satish S <satish@google.com>
Date: Mon, 21 May 2012 16:28:16 +0100
To: "Young, Milan" <Milan.Young@nuance.com>
Cc: Deborah Dahl <dahl@conversational-technologies.com>, Glen Shires <gshires@google.com>, Hans Wennborg <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <CAHZf7Rnaq8nC5GOiAgACfK41hoHhNeFWmyEp6YASg-kqtEpTgg@mail.gmail.com>
There exists a SpeechRecognitionAlternative.interpretation<http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#dfn-interpretation>
for
the purpose of returning semantic interpretation. I propose we change it to
be of type 'Document' (i.e. a DOM document which will contain the XML DOM)
and mention that it contains the interpretation in EMMA format.

Since EMMA is an XML format I think we should have just the above attribute
and not add a text variant. If a web app needs the text representation it
is trivial to get it from the DOM representation with many javascript
libraries.

Cheers
Satish


On Fri, May 18, 2012 at 1:02 AM, Young, Milan <Milan.Young@nuance.com>wrote:

>  I second Deborah’s line of reasoning.  Our goal should be to minimize
> browser-specific dependencies unless there is a clear reason to do
> otherwise.  A few dozen lines of code in the browser is not a good reason.
> ****
>
> ** **
>
> ** **
>
> *From:* Deborah Dahl [mailto:dahl@conversational-technologies.com]
> *Sent:* Thursday, May 17, 2012 1:59 PM
> *To:* 'Glen Shires'; Young, Milan
>
> *Cc:* 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S'
> *Subject:* RE: EMMA in Speech API (was RE: Speech API: first editor's
> draft posted)****
>
>  ** **
>
> Yes, the developer could certainly do that, but the NULL check and
> subsequent hand-building of EMMA introduces a service dependency into the
> developer’s Javascript, which I think we want to minimize.  I think it
> would be better if the speech recognizers that don’t support EMMA natively
> just put a minimal EMMA wrapper around the token result for EMMAXML and
> EMMAText.****
>
> ** **
>
> *From:* Glen Shires [mailto:gshires@google.com]
> *Sent:* Thursday, May 17, 2012 3:08 PM
> *To:* Young, Milan
> *Cc:* Deborah Dahl; Hans Wennborg; public-speech-api@w3.org; Satish S
> *Subject:* Re: EMMA in Speech API (was RE: Speech API: first editor's
> draft posted)****
>
> ** **
>
> Deborah,****
>
> Since the JavaScript Speech API we're defining is new, I presume there
> must be a new JavaScript "glue" layer between the Speech API and the
> existing applications, dialog managers and log analysis tools that you
> mention. [1]  Since the minimal EMMA wrapper you've defined is so simple,
> it could easily be generated in that JavaScript "glue" layer.****
>
> ** **
>
> I propose that we define attributes for EMMAXML and EMMAText so that
> recognizers that do support these do return them, and we make it acceptable
> for user-agents to return NULL for these attributes for recognizers that
> don't support EMMA.****
>
> ** **
>
> [1]
> http://lists.w3.org/Archives/Public/public-speech-api/2012Apr/0014.html***
> *
>
> ** **
>
> Thanks,****
>
> Glen Shires****
>
> ** **
>
> On Tue, Apr 24, 2012 at 9:32 AM, Young, Milan <Milan.Young@nuance.com>
> wrote:****
>
> Thanks for the example Deborah.****
>
>  ****
>
> Being that it would be so simple for a recognition engine (or even a UA)
> to add EMMA, are there any objections to exposing it?  We would of course
> also provide plain text to support the simple use cases.****
>
>  ****
>
>  ****
>
>  ****
>
> *From:* Deborah Dahl [mailto:dahl@conversational-technologies.com]
> *Sent:* Tuesday, April 24, 2012 8:52 AM
> *To:* Young, Milan; 'Glen Shires'****
>
>
> *Cc:* 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S'****
>
> *Subject:* EMMA in Speech API (was RE: Speech API: first editor's draft
> posted)****
>
>  ****
>
> Hi Milan,****
>
> Yes, I think that it wouldn’t be too difficult to wrap a token result with
> a minimal  EMMA wrapper.****
>
>  ****
>
> I think the following would be the minimal EMMA required to represent just
> the spoken tokens. ****
>
> Utterance: “flights from Boston to Denver”****
>
>  ****
>
> <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">****
>
>     <emma:interpretation id="int1" ****
>
>       emma:medium="acoustic" ****
>
>       emma:mode="voice">****
>
>          <emma:literal>****
>
>               flights from boston to denver****
>
>          </emma:literal>****
>
>    </emma: interpretation>****
>
> </emma:emma>****
>
>  ****
>
> If you wanted to add confidence and a semantic interpretation, the EMMA
> would be like this:****
>
> <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">****
>
>     <emma:interpretation id="int1" ****
>
>       emma:confidence="0.75"****
>
>      emma:tokens="flights from boston to denver" ****
>
>      emma:medium="acoustic" ****
>
>      emma:mode="voice">****
>
>       <origin>Boston</origin>****
>
>       <destination>Denver</destination>****
>
>    </emma:interpretation>****
>
> </emma:emma>****
>
>  ****
>
> Debbie****
>
>  ****
>
> *From:* Young, Milan [mailto:Milan.Young@nuance.com]
> *Sent:* Monday, April 23, 2012 6:18 PM
> *To:* Deborah Dahl; 'Glen Shires'
> *Cc:* 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S'
> *Subject:* RE: Speech API: first editor's draft posted****
>
>  ****
>
> Deborah, is there a form of an EMMA result that only encodes the spoken
> phrase (omits interpretation etc)?  If so, perhaps vendors that do not
> currently support EMMA could use this trivial wrapper to achieve compliance.
> ****
>
>  ****
>
> Either way I agree that EMMA must be a part of the spec.  I didn’t notice
> that this had been pulled in this most recent proposal or I would have
> mentioned it myself.****
>
>  ****
>
> Thanks****
>
>  ****
>
> *From:* Deborah Dahl [mailto:dahl@conversational-technologies.com]
> *Sent:* Monday, April 23, 2012 1:21 PM
> *To:* 'Glen Shires'
> *Cc:* 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S'
> *Subject:* RE: Speech API: first editor's draft posted****
>
>  ****
>
> I think standardization will actually be accelerated by making EMMA part
> of the specification. EMMA (and its predecessor NLSML) were in fact
> originally partly motivated by the non-interoperable and proprietary ways
> that different speech recognizers represented semantic interpretations.
> This made it very difficult for an application to be used with different
> speech recognizers.****
>
> I don’t know what proportion of existing speech services do or don’t
> support EMMA, but there are definitely speech services, as well multimodal
> application platforms, that do. I know that there will be applications and
> more generally, development platforms, that won’t be able to use this spec
> unless they can get EMMA results. ****
>
> *From:* Glen Shires [mailto:gshires@google.com]
> *Sent:* Monday, April 23, 2012 3:37 PM
> *To:* Deborah Dahl
> *Cc:* Hans Wennborg; public-speech-api@w3.org; Satish S
> *Subject:* Re: Speech API: first editor's draft posted****
>
>  ****
>
> For this initial specification, we believe that a simplified API will
> accelerate implementation, interoperability testing, standardization and
> ultimately developer adoption.  Getting rapid adoption amongst many user
> agents and many speech recognition services is a primary goal. ****
>
>  ****
>
> Many speech recognition services currently do not support EMMA, and EMMA
> is not required for the majority of use cases, therefore I believe EMMA is
> something we should consider adding in a future iteration of this
> specification.****
>
>  ****
>
> /Glen Shires****
>
>  ****
>
>  ****
>
> On Mon, Apr 23, 2012 at 11:44 AM, Deborah Dahl <
> dahl@conversational-technologies.com> wrote:****
>
> Thanks for preparing this draft.
> I'd like to advocate including the EMMAText and EMMAXML attributes in
> SpeechRecognitionResult. One argument is that at least some existing
> consumers of speech recognition results (for example, dialog managers and
> log analysis tools) currently expect EMMA as input. It would be very
> desirable not to have to modify them to process multiple different
> recognizer result formats. A web developer who's new to speech recognition
> can ignore the EMMA if they want, because if all they want is tokens,
> confidence, or semantics, those are available from the
> SpeechRecognitionAlternative objects.****
>
>
> > -----Original Message-----
> > From: Hans Wennborg [mailto:hwennborg@google.com]****
>
> > Sent: Thursday, April 12, 2012 10:36 AM
> > To: public-speech-api@w3.org
> > Cc: Satish S; Glen Shires
> > Subject: Speech API: first editor's draft posted
> >****
>
> > In December, Google proposed [1] to public-webapps a Speech JavaScript
> > API that subset supports the majority of the use-cases in the Speech
> > Incubator Group's Final Report. This proposal provides a programmatic
> > API that enables web-pages to synthesize speech output and to use
> > speech recognition as an input for forms, continuous dictation and
> > control.
> >
> > We have now posted in the Speech-API Community Group's repository, a
> > slightly updated proposal [2], the differences include:
> >
> >  - Document is now self-contained, rather than having multiple
> > references to the XG Final Report.
> >  - Renamed SpeechReco interface to SpeechRecognition
> >  - Renamed interfaces and attributes beginning SpeechInput* to
> > SpeechRecognition*
> >  - Moved EventTarget to constructor of SpeechRecognition
> >  - Clarified that grammars and lang are attributes of SpeechRecognition
> >  - Clarified that if index is greater than or equal to length, returns
> null
> >
> > We welcome discussion and feedback on this editor's draft. Please send
> > your comments to the public-speech-api@w3.org mailing list.
> >
> > Glen Shires
> > Hans Wennborg
> >
> > [1] http://lists.w3.org/Archives/Public/public-
> > webapps/2011OctDec/1696.html
> > [2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html****
>
>
>
> ****
>
>  ****
>
> --
> Thanks!****
>
> Glen Shires****
>
>  ****
>
> ** **
>
> ** **
>
Received on Monday, 21 May 2012 15:29:08 UTC