RE: Additional parameters to SpeechRecognition (was "Speech API: first editor's draft posted") from Deborah Dahl on 2012-04-24 (public-speech-api@w3.org from April 2012)

From: Deborah Dahl <dahl@conversational-technologies.com>
Date: Tue, 24 Apr 2012 17:57:36 -0400
To: "'Young, Milan'" <Milan.Young@nuance.com>, "'Hans Wennborg'" <hwennborg@google.com>, "'Satish S'" <satish@google.com>
Cc: <public-speech-api@w3.org>
Message-ID: <028c01cd2265$ad929b00$08b7d100$@conversational-technologies.com>

While the semantics of recognizer confidence does vary in different
recognizers, a numeric value is more useful than a rank order. For example,
if the top two alternatives in the nbest have very similar confidences, a
dialog manager might decide to reprompt the user, or display both
alternatives and let the user pick one, but not if the top candidate has a
much higher confidence than the second candidate.

> -----Original Message-----
> From: Young, Milan [mailto:Milan.Young@nuance.com]
> Sent: Tuesday, April 24, 2012 12:23 PM
> To: Hans Wennborg; Satish S
> Cc: public-speech-api@w3.org
> Subject: RE: Additional parameters to SpeechRecognition (was "Speech API:
> first editor's draft posted")
> 
> There are two reasons for including confidence that I would like this
> community to consider:
>   Efficiency - Similar to the argument Satish put forward for limiting the
size of
> the nbest array, pruning the result candidates at the server is more
efficient.
>   Clipping - There are many environments where background noise and side
> speech that can trigger junk results.  If confidence is low, this will
trigger a
> result and then the application enters a deaf period where it processes
the
> result and discovers the content is junk.  If real speech happens during
this
> phase, its start will be missed.
> 
> Every recognizer that was ever invented has a concept of confidence.  Yes,
> the semantics of that value vary across platforms, but for us to push this
to a
> custom parameter will confuse developers, and ultimately slow adoption.
> 
> 
> Regarding the timeout family, an open-ended dialog like "Tell me what is
> wrong with your computer", should have generous timeouts.  Compare this
> to "So it's something to do with your new Google double mouse
> configuration, is that correct?" which should have short timeouts.
> 
> Our goal should be a consistent application experience across UAs, and
that's
> only going to happen if we standardize timeouts.  I would also like to
mention
> that the definition of these timeouts is clear and has been industry
standard
> for 10+ years.
> 
> Thanks
> 
> 
> 
> -----Original Message-----
> From: Hans Wennborg [mailto:hwennborg@google.com]
> Sent: Tuesday, April 24, 2012 8:25 AM
> To: Satish S
> Cc: public-speech-api@w3.org
> Subject: Re: Additional parameters to SpeechRecognition (was "Speech API:
> first editor's draft posted")
> 
> On Tue, Apr 24, 2012 at 14:52, Satish S <satish@google.com> wrote:
> > (Splitting off to a new thread so we can follow discussions easily.
> > Please start a new threads for proposed additions/changes)
> >
> >> SpeechRecognition
> >>  - In addition to the three parameters you have listed, I see the
following
> as necessary:
> >>        integer maxNBest;
> >
> > I can see speech engines defaulting to a specific number of results
> > and the web app can tweak it based on performance characteristics it
> > needs. Without this attribute the engine should be asked to always
> > give the max number of results and let the web app filter, which seems
> > suboptimal.
> 
> I agree, I think this would be a good addition.
> 
> >>        float confidenceThreshold;
> >
> > SpeechRecognitionAlternative.confidence provides the value so the web
> > app can filter based on that if it needs to. With that in mind do we
> > need this attribute?
> 
> Agreed. Also, the absolute confidence values are probably not very
> interesting. For example, what does a confidence of 0.5 mean? I see the
> confidence values as useful for providing an ordering of the alternatives,
not
> much else.
> 
> >>        integer completeTimeout;
> >>        integer incompleteTimeout;
> >>        integer maxSpeechTimeout;
> >
> > Do you have use cases where these should vary between different web
> > apps? I think it would be better to leave it to the UA so all web apps
> > have consistent timeouts and user expectation doesn't get affected.
> 
> I don't like the idea of having three different timeouts. Couldn't the web
> page handle timeouts itself, by calling abort() on the SpeechRecognition
> object if it takes too long?
> 
> Thanks,
> Hans
>

Received on Tuesday, 24 April 2012 22:01:10 UTC