RE: EMMA in Speech API (was RE: Speech API: first editor's draft posted) from Young, Milan on 2012-05-23 (public-speech-api@w3.org from May 2012)

From: Young, Milan <Milan.Young@nuance.com>
Date: Wed, 23 May 2012 17:40:26 +0000
To: Satish S <satish@google.com>
CC: Bjorn Bringert <bringert@google.com>, Deborah Dahl <dahl@conversational-technologies.com>, Glen Shires <gshires@google.com>, Hans Wennborg <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C1A45C7E9@SOM-EXCH04.nuance.com>

Hello Satish,

I agree on Document vs DOMString.

I fail to understand your motivation for allowing null on emma:
  - If the issue is efficiency, the request can be late bound.  That is the browser can avoid producing the EMMA object until such time as it is requested.
  - If the issue is time to implementation, we're going to need to escalate this conversation.  Debbie and I demonstrated a simple wrapper that can be implemented with a few lines of code.  Avoiding a few lines of browser code is not sufficient motivation for breaking compatibility across implementations.

Traversing a document to extract information is a pain, and slots are a fundamental/simple concept in speech.  So your argument doesn't line up well with the "make simple things simple" paradigm that your company and others have been promoting.  We have a proposal that will give developers the best of both worlds, without an overly complicated spec.  Implementation might exceed the "few lines of code" threshold, but not significantly.

Thanks




From: Satish S [mailto:satish@google.com]
Sent: Wednesday, May 23, 2012 5:42 AM
To: Young, Milan
Cc: Bjorn Bringert; Deborah Dahl; Glen Shires; Hans Wennborg; public-speech-api@w3.org
Subject: Re: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

--EMMA--
Can we agree the following addition to SpeechRecognitionResult (section 5.1)?
 readonly attribute DOMString emma;

Given the use cases I'm ok with adding the emma attribute. I think it is better if it is of type Document (similar to XmlHttpRequest.responseXML) as there would be far more use cases for parsing the emma result in JS than using it as a string.

I also think this should be an optional attribute and the UA can set it to null if the service didn't return EMMA data. The reasoning is that accessing the raw utterance value is far simpler with the JS object (alternative.utterance) than navigating through the emma xml DOM, so the emma attribute will be useful only for those web apps which require additional data from the recognizer and know that the service they are using will be sending that data.

--Interpretation--

 I like Satish's suggestion of rendering the interpretation structure as a native JS object.  The only dissenting opinion I heard was that the value of the interpretation should be readily accessible for developers that didn't want to traverse slots.

I thinks both of these needs can be met using the <emma:literal> tag which sits under <emma:interpretation>.  For simple interpretation scenarios (like basic dictation), the recognizer will populate the <emma:literal> with the text meaning of the utterance.  This will be reflected as a simple string accessible from the "interpretation" property of the SpeechRecognitionAlternative.  If the EMMA contains slots, these will be accessible using the JS property syntax (eg interpretation.time).  Note that I don't think EMMA prevents <emma:literal> in parallel with slots.  But I don't think we should change the proposed design either way.

If emma is exposed as a separate attribute, I think that alone is sufficient and no need to represent the same data in another format (i.e. as a JS object). My earlier proposal was to only expose JS objects but since there are valid cases where that wouldn't be sufficient exposing the emma document is ok. I'd prefer we don't inject the interpretation received in SISR into the emma XML as again it is simpler to access it via JS.

Received on Wednesday, 23 May 2012 17:40:57 UTC