W3C home > Mailing lists > Public > public-speech-api@w3.org > May 2012

RE: EMMA in Speech API (was RE: Speech API: first editor's draft posted)

From: Deborah Dahl <dahl@conversational-technologies.com>
Date: Mon, 21 May 2012 13:36:14 -0400
To: "'Satish S'" <satish@google.com>, "'Bjorn Bringert'" <bringert@google.com>
Cc: "'Young, Milan'" <Milan.Young@nuance.com>, "'Glen Shires'" <gshires@google.com>, "'Hans Wennborg'" <hwennborg@google.com>, <public-speech-api@w3.org>
Message-ID: <012c01cd3778$38e04f00$aaa0ed00$@conversational-technologies.com>
Many applications will have a dialog manager that uses the speech
recognition result to conduct a spoken dialog with the user. In that case it
is extremely useful for the dialog manager to have a uniform representation
for speech recognition results, so that the dialog manager can be somewhat
independent of the recognizer. In fact, there are existing applications that
I know of that do expect EMMA-formatted results. It would be very
inconvenient for these dialog managers to have to be modified to accommodate
different formats depending on the recognition service. Similarly, another
type of consumer of speech recognition results is likely to be logging and
analysis applications, which again could benefit from uniform EMMA results.
I believe it's also undesirable for the application developer to have to
look at the result and then manually create an EMMA wrapper for it. 

Yes, SISR is a standard for representing the semantic result, but it doesn't
provide a way to represent any metadata. In addition, it won't help if the
language model is an SLM rather than a grammar. 

Also, just a general comment about API's and novice developers. I think
developers in general are very good at ignoring aspects of an API that they
don't plan to use, as long as they have a simple way to get started. I think
developer problems mainly arise with API's where there's a huge learning
curve just to do hello world.

 

From: Satish S [mailto:satish@google.com] 
Sent: Monday, May 21, 2012 12:17 PM
To: Bjorn Bringert
Cc: Young, Milan; Deborah Dahl; Glen Shires; Hans Wennborg;
public-speech-api@w3.org
Subject: Re: EMMA in Speech API (was RE: Speech API: first editor's draft
posted)

 

I would prefer having an easy solution for the majority of apps which
just want the interpretation, which is either just a string or a JS
object (when using SISR). Boilerplate code sucks. Having EMMA
available sounds ok too, but that seems like a minority feature to me.

 

Seems like the current type "any" is suited for that. Since SISR represents
the results of semantic interpretation as ECMAScript that is interoperable
and non-proprietary, the goal of a cross-browser semantic interpretation
format seems satisfied. Are there other reasons to add EMMA support?
Received on Monday, 21 May 2012 17:37:03 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 21 May 2012 17:37:09 GMT