Re: EMMA in Speech API (was RE: Speech API: first editor's draft posted) from Satish S on 2012-05-21 (public-speech-api@w3.org from May 2012)

From: Satish S <satish@google.com>
Date: Mon, 21 May 2012 22:34:40 +0100
To: Deborah Dahl <dahl@conversational-technologies.com>
Cc: Bjorn Bringert <bringert@google.com>, "Young, Milan" <Milan.Young@nuance.com>, Glen Shires <gshires@google.com>, Hans Wennborg <hwennborg@google.com>, public-speech-api@w3.org
Message-ID: <CAHZf7RmEtVaVOduVy+991johXJG+Jhi3JHrSSoEUi_EmW8Nk9Q@mail.gmail.com>

I agree that having a uniform representation of results and semantic
interpretation is necessary. The only question I have is why XML formatted
according to EMMA is preferred over native JS objects. To clarify, I'm
suggesting that semantic information, if received as EMMA from the
recognizer, be converted by the UA to native JS objects so accessing them
is far simpler.

With EMMA XML:
  var doc = alternative.emmaXML;
  var interpretation = doc.getElementsByTagName("emma:interpretation")[0];
  var origin =
interpretation.getElementsByTagName("origin")[0].childNodes[0].nodeValue;
  var destination =
interpretation.getElementsByTagName("destination")[0].childNodes[0].nodeValue;

Instead, with native JS object:
  var origin = alternative.interpretation.origin
  var destination = alternative.interpretation.destination

I prefer the latter as it does away with the boilerplate that every single
web app has to go through.

Yes, SISR is a standard for representing the semantic result, but it
> doesn’t provide a way to represent any metadata.


Could you explain what you mean by meta data in this context with a use
case? It should be possible to fit that in the above proposal as well.

Cheers
Satish


On Mon, May 21, 2012 at 6:36 PM, Deborah Dahl <
dahl@conversational-technologies.com> wrote:

> Many applications will have a dialog manager that uses the speech
> recognition result to conduct a spoken dialog with the user. In that case
> it is extremely useful for the dialog manager to have a uniform
> representation for speech recognition results, so that the dialog manager
> can be somewhat independent of the recognizer. In fact, there are existing
> applications that I know of that do expect EMMA-formatted results. It would
> be very inconvenient for these dialog managers to have to be modified to
> accommodate different formats depending on the recognition service.
> Similarly, another type of consumer of speech recognition results is likely
> to be logging and analysis applications, which again could benefit from
> uniform EMMA results. I believe it’s also undesirable for the application
> developer to have to look at the result and then manually create an EMMA
> wrapper for it. ****
>
> Yes, SISR is a standard for representing the semantic result, but it
> doesn’t provide a way to represent any metadata. In addition, it won’t help
> if the language model is an SLM rather than a grammar. ****
>
> Also, just a general comment about API’s and novice developers. I think
> developers in general are very good at ignoring aspects of an API that they
> don’t plan to use, as long as they have a simple way to get started. I
> think developer problems mainly arise with API’s where there’s a huge
> learning curve just to do hello world.****
>
> ** **
>
> *From:* Satish S [mailto:satish@google.com]
> *Sent:* Monday, May 21, 2012 12:17 PM
> *To:* Bjorn Bringert
> *Cc:* Young, Milan; Deborah Dahl; Glen Shires; Hans Wennborg;
> public-speech-api@w3.org
>
> *Subject:* Re: EMMA in Speech API (was RE: Speech API: first editor's
> draft posted)****
>
> ** **
>
> I would prefer having an easy solution for the majority of apps which
>
> just want the interpretation, which is either just a string or a JS
> object (when using SISR). Boilerplate code sucks. Having EMMA
> available sounds ok too, but that seems like a minority feature to me.****
>
> ** **
>
> Seems like the current type "any" is suited for that. Since SISR
> represents the results of semantic interpretation as ECMAScript that is
> interoperable and non-proprietary, the goal of a cross-browser semantic
> interpretation format seems satisfied. Are there other reasons to add EMMA
> support?****
>

Received on Monday, 21 May 2012 21:35:30 UTC