RE: Additional parameters to SpeechRecognition (was "Speech API: first editor's draft posted")

The speech community has lived for 20 years with the fact that confidence values are not portable across engines.  I understand that we are courting a new class of developers with this HTML-based initiative, but I want to be careful not to dumb it down to the point where we impact the mainstream speech industry.

Incrementally bumping up confidence (eg recognizer.confidence += 5) in response to a series of misrecognitions is a common technique.  I also find it generally ugly that confidence is special cased with a function instead of a property.  (Is it a JS limitation that you cannot mark a property as write only?)

I would rather say something like "Recognition engines generally do a good job of choosing the right confidence value for a recognition task.  If you do choose to read this property, know that it's value is not portable to other recognition tasks, other speech engines, or other user agents."

Thanks

From: Glen Shires [mailto:gshires@google.com]
Sent: Wednesday, April 25, 2012 8:11 AM
To: Hans Wennborg
Cc: Young, Milan; Satish S; public-speech-api@w3.org
Subject: Re: Additional parameters to SpeechRecognition (was "Speech API: first editor's draft posted")

confidenceThreshold

I think we all agree that speech recognizers have a concept of confidence, and that it can be mapped to a monotonically increasing range of 0.0 to 1.0.  However, specific values (for example 0.5) do not correspond to the same level of confidence for different recognizers.

I believe that if the developer does not set the confidenceThreshold, the speech recognizer should use a default value that is appropriate for that recognizer.

A complication with a confidenceThreshold attribute is defining the default value (if the value is read, but not written, what value does the BROWSER return? - particularly because the optimal default value may vary from one RECOGNIZER to another).

Perhaps instead of an attribute, this should be a write-only value, specifically a setConfidenceThreshold method.

/Glen Shires
On Wed, Apr 25, 2012 at 6:43 AM, Hans Wennborg <hwennborg@google.com<mailto:hwennborg@google.com>> wrote:
On Tue, Apr 24, 2012 at 17:22, Young, Milan <Milan.Young@nuance.com<mailto:Milan.Young@nuance.com>> wrote:
> There are two reasons for including confidence that I would like this community to consider:
>  Efficiency - Similar to the argument Satish put forward for limiting the size of the nbest array, pruning the result candidates at the server is more efficient.
>  Clipping - There are many environments where background noise and side speech that can trigger junk results.  If confidence is low, this will trigger a result and then the application enters a deaf period where it processes the result and discovers the content is junk.  If real speech happens during this phase, its start will be missed.
>
> Every recognizer that was ever invented has a concept of confidence.  Yes, the semantics of that value vary across platforms, but for us to push this to a custom parameter will confuse developers, and ultimately slow adoption.
Ok, I don't feel strongly about this, so I would be fine adding a
confidenceThreshold if others agree.

> Regarding the timeout family, an open-ended dialog like "Tell me what is wrong with your computer", should have generous timeouts.  Compare this to "So it's something to do with your new Google double mouse configuration, is that correct?" which should have short timeouts.
>
> Our goal should be a consistent application experience across UAs, and that's only going to happen if we standardize timeouts.  I would also like to mention that the definition of these timeouts is clear and has been industry standard for 10+ years.
What do you think about my idea of just letting the web page handle
the timeout itself, calling abort() when it decides a request is
taking too long?


Thanks,
Hans



--
Thanks!
Glen Shires

Received on Wednesday, 25 April 2012 16:48:30 UTC