W3C home > Mailing lists > Public > public-speech-api@w3.org > April 2012

RE: Additional parameters to SpeechRecognition (was "Speech API: first editor's draft posted")

From: Deborah Dahl <dahl@conversational-technologies.com>
Date: Wed, 25 Apr 2012 16:40:08 -0400
To: "'Young, Milan'" <Milan.Young@nuance.com>, "'Glen Shires'" <gshires@google.com>
Cc: "'Hans Wennborg'" <hwennborg@google.com>, "'Satish S'" <satish@google.com>, <public-speech-api@w3.org>
Message-ID: <040101cd2323$9b786a80$d2693f80$@conversational-technologies.com>
Yes, the filtering that I was talking about is something that you would do
at the level of the interaction manager, whether that's implemented on the
client, or whether it's running on another server. It doesn't mean that
there isn't a value in also doing some filtering on the server, for the
reasons Milan pointed out.

 

From: Young, Milan [mailto:Milan.Young@nuance.com] 
Sent: Wednesday, April 25, 2012 2:08 PM
To: Glen Shires
Cc: Hans Wennborg; Satish S; public-speech-api@w3.org
Subject: RE: Additional parameters to SpeechRecognition (was "Speech API:
first editor's draft posted")

 

You have ignored my two points about why it is often best to filter low
confidence matches on the server (ie performance and clipping).  Just
because Deborah points out that there are additional use cases for filtering
on the client does not invalidate my claim.

 

Yes, we should try to deliver consistent behavior across UAs, speech
engines, and even dialog states.  But let's not throw the baby out with the
bathwater if we can't nail it down in a v1.

 

 

From: Glen Shires [mailto:gshires@google.com] 
Sent: Wednesday, April 25, 2012 10:43 AM
To: Young, Milan
Cc: Hans Wennborg; Satish S; public-speech-api@w3.org
Subject: Re: Additional parameters to SpeechRecognition (was "Speech API:
first editor's draft posted")

 

I think (hope) that most web developers won't have to worry about confidence
values because the default set by the speech recognizer should be
sufficient.

 

However, a JS API developer savvy enough to understand how/when to properly
set a confidenceThreshold, is also savvy enough to intelligently process the
confidence values returned in the results. As Deborah mentioned [1], "For
example, if the top two alternatives in the nbest have very similar
confidences...".  Typically, processing the confidence result values is a
much better strategy than trying to tune the confidenceThreshold.

 

Only extremely savvy JS API developers will understand how to properly tune
the confidenceThreshold so that it prunes (but doesn't over prune) the data
returned.  I believe these developers can best adjust the
confidenceThreshold by processing the confidence result values returned by
prior recognitions (as opposed to simply bumping the default value by 0.05).


 

 

Also, from an implementation standpoint, there's a major issue with making
confidenceThreshold readable. If the developer switches to a new recognizer,
the default confidenceThreshold may change. If the developer then reads the
confidenceThreshold (for example, to increment it by 0.05), then presumably
the browser needs to get the default confidence value from the speech
recognizer. For a remote recognizer, this round-trip takes time, and the
browser cannot stall the javascript processing.

 

/Glen Shires

 

[1] http://lists.w3.org/Archives/Public/public-speech-api/2012Apr/0031.html

On Wed, Apr 25, 2012 at 9:47 AM, Young, Milan <Milan.Young@nuance.com>
wrote:

The speech community has lived for 20 years with the fact that confidence
values are not portable across engines.  I understand that we are courting a
new class of developers with this HTML-based initiative, but I want to be
careful not to dumb it down to the point where we impact the mainstream
speech industry.

 

Incrementally bumping up confidence (eg recognizer.confidence += 5) in
response to a series of misrecognitions is a common technique.  I also find
it generally ugly that confidence is special cased with a function instead
of a property.  (Is it a JS limitation that you cannot mark a property as
write only?)

 

I would rather say something like "Recognition engines generally do a good
job of choosing the right confidence value for a recognition task.  If you
do choose to read this property, know that it's value is not portable to
other recognition tasks, other speech engines, or other user agents."

 

Thanks

 

From: Glen Shires [mailto:gshires@google.com] 
Sent: Wednesday, April 25, 2012 8:11 AM
To: Hans Wennborg
Cc: Young, Milan; Satish S; public-speech-api@w3.org


Subject: Re: Additional parameters to SpeechRecognition (was "Speech API:
first editor's draft posted")

 

confidenceThreshold

 

I think we all agree that speech recognizers have a concept of confidence,
and that it can be mapped to a monotonically increasing range of 0.0 to 1.0.
However, specific values (for example 0.5) do not correspond to the same
level of confidence for different recognizers.

 

I believe that if the developer does not set the confidenceThreshold, the
speech recognizer should use a default value that is appropriate for that
recognizer.

 

A complication with a confidenceThreshold attribute is defining the default
value (if the value is read, but not written, what value does the BROWSER
return? - particularly because the optimal default value may vary from one
RECOGNIZER to another).

 

Perhaps instead of an attribute, this should be a write-only value,
specifically a setConfidenceThreshold method.

 

/Glen Shires

On Wed, Apr 25, 2012 at 6:43 AM, Hans Wennborg <hwennborg@google.com> wrote:

On Tue, Apr 24, 2012 at 17:22, Young, Milan <Milan.Young@nuance.com> wrote:
> There are two reasons for including confidence that I would like this
community to consider:
>  Efficiency - Similar to the argument Satish put forward for limiting the
size of the nbest array, pruning the result candidates at the server is more
efficient.
>  Clipping - There are many environments where background noise and side
speech that can trigger junk results.  If confidence is low, this will
trigger a result and then the application enters a deaf period where it
processes the result and discovers the content is junk.  If real speech
happens during this phase, its start will be missed.
>
> Every recognizer that was ever invented has a concept of confidence.  Yes,
the semantics of that value vary across platforms, but for us to push this
to a custom parameter will confuse developers, and ultimately slow adoption.

Ok, I don't feel strongly about this, so I would be fine adding a
confidenceThreshold if others agree.


> Regarding the timeout family, an open-ended dialog like "Tell me what is
wrong with your computer", should have generous timeouts.  Compare this to
"So it's something to do with your new Google double mouse configuration, is
that correct?" which should have short timeouts.
>
> Our goal should be a consistent application experience across UAs, and
that's only going to happen if we standardize timeouts.  I would also like
to mention that the definition of these timeouts is clear and has been
industry standard for 10+ years.

What do you think about my idea of just letting the web page handle
the timeout itself, calling abort() when it decides a request is
taking too long?


Thanks,
Hans





 

-- 
Thanks!

Glen Shires

 





 

-- 
Thanks!

Glen Shires

 
Received on Wednesday, 25 April 2012 20:40:43 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:27:22 UTC