RE: EMMA in Speech API (was RE: Speech API: first editor's draft posted) from Young, Milan on 2012-05-18 (public-speech-api@w3.org from May 2012)

From: Young, Milan <Milan.Young@nuance.com>
Date: Fri, 18 May 2012 00:02:07 +0000
To: Deborah Dahl <dahl@conversational-technologies.com>, 'Glen Shires' <gshires@google.com>
CC: 'Hans Wennborg' <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>, 'Satish S' <satish@google.com>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C1A45B692@SOM-EXCH04.nuance.com>
I second Deborah's line of reasoning.  Our goal should be to minimize browser-specific dependencies unless there is a clear reason to do otherwise.  A few dozen lines of code in the browser is not a good reason.


From: Deborah Dahl [mailto:dahl@conversational-technologies.com]
Sent: Thursday, May 17, 2012 1:59 PM
To: 'Glen Shires'; Young, Milan
Cc: 'Hans Wennborg'; public-speech-api@w3.org; 'Satish S'
Subject: RE: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

Yes, the developer could certainly do that, but the NULL check and subsequent hand-building of EMMA introduces a service dependency into the developer's Javascript, which I think we want to minimize.  I think it would be better if the speech recognizers that don't support EMMA natively just put a minimal EMMA wrapper around the token result for EMMAXML and EMMAText.

From: Glen Shires [mailto:gshires@google.com]<mailto:[mailto:gshires@google.com]>
Sent: Thursday, May 17, 2012 3:08 PM
To: Young, Milan
Cc: Deborah Dahl; Hans Wennborg; public-speech-api@w3.org<mailto:public-speech-api@w3.org>; Satish S
Subject: Re: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

Deborah,
Since the JavaScript Speech API we're defining is new, I presume there must be a new JavaScript "glue" layer between the Speech API and the existing applications, dialog managers and log analysis tools that you mention. [1]  Since the minimal EMMA wrapper you've defined is so simple, it could easily be generated in that JavaScript "glue" layer.

I propose that we define attributes for EMMAXML and EMMAText so that recognizers that do support these do return them, and we make it acceptable for user-agents to return NULL for these attributes for recognizers that don't support EMMA.

[1] http://lists.w3.org/Archives/Public/public-speech-api/2012Apr/0014.html

Thanks,
Glen Shires

On Tue, Apr 24, 2012 at 9:32 AM, Young, Milan <Milan.Young@nuance.com<mailto:Milan.Young@nuance.com>> wrote:
Thanks for the example Deborah.

Being that it would be so simple for a recognition engine (or even a UA) to add EMMA, are there any objections to exposing it?  We would of course also provide plain text to support the simple use cases.



From: Deborah Dahl [mailto:dahl@conversational-technologies.com<mailto:dahl@conversational-technologies.com>]
Sent: Tuesday, April 24, 2012 8:52 AM
To: Young, Milan; 'Glen Shires'

Cc: 'Hans Wennborg'; public-speech-api@w3.org<mailto:public-speech-api@w3.org>; 'Satish S'
Subject: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

Hi Milan,
Yes, I think that it wouldn't be too difficult to wrap a token result with a minimal  EMMA wrapper.

I think the following would be the minimal EMMA required to represent just the spoken tokens.
Utterance: "flights from Boston to Denver"

<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
    <emma:interpretation id="int1"
      emma:medium="acoustic"
      emma:mode="voice">
         <emma:literal>
              flights from boston to denver
         </emma:literal>
   </emma: interpretation>
</emma:emma>

If you wanted to add confidence and a semantic interpretation, the EMMA would be like this:
<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
    <emma:interpretation id="int1"
      emma:confidence="0.75"
     emma:tokens="flights from boston to denver"
     emma:medium="acoustic"
     emma:mode="voice">
      <origin>Boston</origin>
      <destination>Denver</destination>
   </emma:interpretation>
</emma:emma>

Debbie

From: Young, Milan [mailto:Milan.Young@nuance.com]<mailto:[mailto:Milan.Young@nuance.com]>
Sent: Monday, April 23, 2012 6:18 PM
To: Deborah Dahl; 'Glen Shires'
Cc: 'Hans Wennborg'; public-speech-api@w3.org<mailto:public-speech-api@w3.org>; 'Satish S'
Subject: RE: Speech API: first editor's draft posted

Deborah, is there a form of an EMMA result that only encodes the spoken phrase (omits interpretation etc)?  If so, perhaps vendors that do not currently support EMMA could use this trivial wrapper to achieve compliance.

Either way I agree that EMMA must be a part of the spec.  I didn't notice that this had been pulled in this most recent proposal or I would have mentioned it myself.

Thanks

From: Deborah Dahl [mailto:dahl@conversational-technologies.com]<mailto:[mailto:dahl@conversational-technologies.com]>
Sent: Monday, April 23, 2012 1:21 PM
To: 'Glen Shires'
Cc: 'Hans Wennborg'; public-speech-api@w3.org<mailto:public-speech-api@w3.org>; 'Satish S'
Subject: RE: Speech API: first editor's draft posted

I think standardization will actually be accelerated by making EMMA part of the specification. EMMA (and its predecessor NLSML) were in fact originally partly motivated by the non-interoperable and proprietary ways that different speech recognizers represented semantic interpretations. This made it very difficult for an application to be used with different speech recognizers.
I don't know what proportion of existing speech services do or don't support EMMA, but there are definitely speech services, as well multimodal application platforms, that do. I know that there will be applications and more generally, development platforms, that won't be able to use this spec unless they can get EMMA results.
From: Glen Shires [mailto:gshires@google.com]<mailto:[mailto:gshires@google.com]>
Sent: Monday, April 23, 2012 3:37 PM
To: Deborah Dahl
Cc: Hans Wennborg; public-speech-api@w3.org<mailto:public-speech-api@w3.org>; Satish S
Subject: Re: Speech API: first editor's draft posted

For this initial specification, we believe that a simplified API will accelerate implementation, interoperability testing, standardization and ultimately developer adoption.  Getting rapid adoption amongst many user agents and many speech recognition services is a primary goal.

Many speech recognition services currently do not support EMMA, and EMMA is not required for the majority of use cases, therefore I believe EMMA is something we should consider adding in a future iteration of this specification.

/Glen Shires


On Mon, Apr 23, 2012 at 11:44 AM, Deborah Dahl <dahl@conversational-technologies.com<mailto:dahl@conversational-technologies.com>> wrote:
Thanks for preparing this draft.
I'd like to advocate including the EMMAText and EMMAXML attributes in
SpeechRecognitionResult. One argument is that at least some existing
consumers of speech recognition results (for example, dialog managers and
log analysis tools) currently expect EMMA as input. It would be very
desirable not to have to modify them to process multiple different
recognizer result formats. A web developer who's new to speech recognition
can ignore the EMMA if they want, because if all they want is tokens,
confidence, or semantics, those are available from the
SpeechRecognitionAlternative objects.

> -----Original Message-----
> From: Hans Wennborg [mailto:hwennborg@google.com<mailto:hwennborg@google.com>]
> Sent: Thursday, April 12, 2012 10:36 AM
> To: public-speech-api@w3.org<mailto:public-speech-api@w3.org>
> Cc: Satish S; Glen Shires
> Subject: Speech API: first editor's draft posted
>
> In December, Google proposed [1] to public-webapps a Speech JavaScript
> API that subset supports the majority of the use-cases in the Speech
> Incubator Group's Final Report. This proposal provides a programmatic
> API that enables web-pages to synthesize speech output and to use
> speech recognition as an input for forms, continuous dictation and
> control.
>
> We have now posted in the Speech-API Community Group's repository, a
> slightly updated proposal [2], the differences include:
>
>  - Document is now self-contained, rather than having multiple
> references to the XG Final Report.
>  - Renamed SpeechReco interface to SpeechRecognition
>  - Renamed interfaces and attributes beginning SpeechInput* to
> SpeechRecognition*
>  - Moved EventTarget to constructor of SpeechRecognition
>  - Clarified that grammars and lang are attributes of SpeechRecognition
>  - Clarified that if index is greater than or equal to length, returns
null
>
> We welcome discussion and feedback on this editor's draft. Please send
> your comments to the public-speech-api@w3.org<mailto:public-speech-api@w3.org> mailing list.
>
> Glen Shires
> Hans Wennborg
>
> [1] http://lists.w3.org/Archives/Public/public-
> webapps/2011OctDec/1696.html
> [2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html



--
Thanks!
Glen Shires
Received on Friday, 18 May 2012 00:02:41 UTC