Re: Additional parameters to SpeechRecognition (was "Speech API: first editor's draft posted") from Satish S on 2012-04-25 (public-speech-api@w3.org from April 2012)

From: Satish S <satish@google.com>
Date: Wed, 25 Apr 2012 21:55:50 +0100
To: Deborah Dahl <dahl@conversational-technologies.com>
Cc: "Young, Milan" <Milan.Young@nuance.com>, Glen Shires <gshires@google.com>, Hans Wennborg <hwennborg@google.com>, public-speech-api@w3.org
Message-ID: <CAHZf7RkfKN-vMP0xQbFPmN9jhXWveaTZZSMymH9y7m8rUHMLcA@mail.gmail.com>
A performance conscious web app could use a smaller maxNBest value so the
recognizer doesn't generate all the hypotheses and then decide to throw
away some of them based on the confidence value. I don't see value in
providing both in this API.

I agree all the parameters suggested have been in use over the years but we
should carefully consider each one in context of web apps. Can we discuss a
list of web apps which will require both maxNBest and confidenceThreshold
and see how to address them? In fact it would be great to adopt that as a
framework for all API discussions, i.e. provide sample web app scenarios
where the change/addition will be required.

Cheers
Satish


On Wed, Apr 25, 2012 at 9:40 PM, Deborah Dahl <
dahl@conversational-technologies.com> wrote:

> Yes, the filtering that I was talking about is something that you would do
> at the level of the interaction manager, whether that’s implemented on the
> client, or whether it’s running on another server. It doesn’t mean that
> there isn’t a value in also doing some filtering on the server, for the
> reasons Milan pointed out.****
>
> ** **
>
> *From:* Young, Milan [mailto:Milan.Young@nuance.com]
> *Sent:* Wednesday, April 25, 2012 2:08 PM
> *To:* Glen Shires
>
> *Cc:* Hans Wennborg; Satish S; public-speech-api@w3.org
> *Subject:* RE: Additional parameters to SpeechRecognition (was "Speech
> API: first editor's draft posted")****
>
> ** **
>
> You have ignored my two points about why it is often best to filter low
> confidence matches on the server (ie performance and clipping).  Just
> because Deborah points out that there are additional use cases for
> filtering on the client does not invalidate my claim.****
>
> ** **
>
> Yes, we should try to deliver consistent behavior across UAs, speech
> engines, and even dialog states.  But let’s not throw the baby out with the
> bathwater if we can’t nail it down in a v1.****
>
> ** **
>
> ** **
>
> *From:* Glen Shires [mailto:gshires@google.com]
> *Sent:* Wednesday, April 25, 2012 10:43 AM
> *To:* Young, Milan
> *Cc:* Hans Wennborg; Satish S; public-speech-api@w3.org
> *Subject:* Re: Additional parameters to SpeechRecognition (was "Speech
> API: first editor's draft posted")****
>
> ** **
>
> I think (hope) that most web developers won't have to worry about
> confidence values because the default set by the speech recognizer should
> be sufficient.****
>
> ** **
>
> However, a JS API developer savvy enough to understand how/when to
> properly set a confidenceThreshold, is also savvy enough to intelligently
> process the confidence values returned in the results. As Deborah mentioned
> [1], "For example, if the top two alternatives in the nbest have very
> similar confidences...".  Typically, processing the confidence result
> values is a much better strategy than trying to tune the
> confidenceThreshold.****
>
> ** **
>
> Only extremely savvy JS API developers will understand how to properly
> tune the confidenceThreshold so that it prunes (but doesn't over prune) the
> data returned.  I believe these developers can best adjust
> the confidenceThreshold by processing the confidence result values returned
> by prior recognitions (as opposed to simply bumping the default value by
> 0.05). ****
>
> ** **
>
> ** **
>
> Also, from an implementation standpoint, there's a major issue with
> making confidenceThreshold readable. If the developer switches to a new
> recognizer, the default confidenceThreshold may change. If the developer
> then reads the confidenceThreshold (for example, to increment it by 0.05),
> then presumably the browser needs to get the default confidence value from
> the speech recognizer. For a remote recognizer, this round-trip takes time,
> and the browser cannot stall the javascript processing.****
>
> ** **
>
> /Glen Shires****
>
> ** **
>
> [1]
> http://lists.w3.org/Archives/Public/public-speech-api/2012Apr/0031.html***
> *
>
> On Wed, Apr 25, 2012 at 9:47 AM, Young, Milan <Milan.Young@nuance.com>
> wrote:****
>
> The speech community has lived for 20 years with the fact that confidence
> values are not portable across engines.  I understand that we are courting
> a new class of developers with this HTML-based initiative, but I want to be
> careful not to dumb it down to the point where we impact the mainstream
> speech industry.****
>
>  ****
>
> Incrementally bumping up confidence (eg recognizer.confidence += 5) in
> response to a series of misrecognitions is a common technique.  I also find
> it generally ugly that confidence is special cased with a function instead
> of a property.  (Is it a JS limitation that you cannot mark a property as
> write only?)****
>
>  ****
>
> I would rather say something like “Recognition engines generally do a good
> job of choosing the right confidence value for a recognition task.  If you
> do choose to read this property, know that it’s value is not portable to
> other recognition tasks, other speech engines, or other user agents.”****
>
>  ****
>
> Thanks****
>
>  ****
>
> *From:* Glen Shires [mailto:gshires@google.com]
> *Sent:* Wednesday, April 25, 2012 8:11 AM
> *To:* Hans Wennborg
> *Cc:* Young, Milan; Satish S; public-speech-api@w3.org****
>
>
> *Subject:* Re: Additional parameters to SpeechRecognition (was "Speech
> API: first editor's draft posted")****
>
>  ****
>
> confidenceThreshold****
>
>  ****
>
> I think we all agree that speech recognizers have a concept of confidence,
> and that it can be mapped to a monotonically increasing range of 0.0 to
> 1.0.  However, specific values (for example 0.5) do not correspond to the
> same level of confidence for different recognizers.****
>
>  ****
>
> I believe that if the developer does not set the confidenceThreshold, the
> speech recognizer should use a default value that is appropriate for that
> recognizer.****
>
>  ****
>
> A complication with a confidenceThreshold attribute is defining the
> default value (if the value is read, but not written, what value does the
> BROWSER return? - particularly because the optimal default value may vary
> from one RECOGNIZER to another).****
>
>  ****
>
> Perhaps instead of an attribute, this should be a write-only value,
> specifically a setConfidenceThreshold method.****
>
>  ****
>
> /Glen Shires****
>
> On Wed, Apr 25, 2012 at 6:43 AM, Hans Wennborg <hwennborg@google.com>
> wrote:****
>
> On Tue, Apr 24, 2012 at 17:22, Young, Milan <Milan.Young@nuance.com>
> wrote:
> > There are two reasons for including confidence that I would like this
> community to consider:
> >  Efficiency - Similar to the argument Satish put forward for limiting
> the size of the nbest array, pruning the result candidates at the server is
> more efficient.
> >  Clipping - There are many environments where background noise and side
> speech that can trigger junk results.  If confidence is low, this will
> trigger a result and then the application enters a deaf period where it
> processes the result and discovers the content is junk.  If real speech
> happens during this phase, its start will be missed.
> >
> > Every recognizer that was ever invented has a concept of confidence.
>  Yes, the semantics of that value vary across platforms, but for us to push
> this to a custom parameter will confuse developers, and ultimately slow
> adoption.****
>
> Ok, I don't feel strongly about this, so I would be fine adding a
> confidenceThreshold if others agree.****
>
>
> > Regarding the timeout family, an open-ended dialog like "Tell me what is
> wrong with your computer", should have generous timeouts.  Compare this to
> "So it's something to do with your new Google double mouse configuration,
> is that correct?" which should have short timeouts.
> >
> > Our goal should be a consistent application experience across UAs, and
> that's only going to happen if we standardize timeouts.  I would also like
> to mention that the definition of these timeouts is clear and has been
> industry standard for 10+ years.****
>
> What do you think about my idea of just letting the web page handle
> the timeout itself, calling abort() when it decides a request is
> taking too long?
>
>
> Thanks,
> Hans****
>
>
>
> ****
>
>  ****
>
> --
> Thanks!****
>
> Glen Shires****
>
>  ****
>
>
>
> ****
>
> ** **
>
> --
> Thanks!****
>
> Glen Shires****
>
> ** **
>
Received on Wednesday, 25 April 2012 20:56:20 UTC