- From: Satish S <satish@google.com>
- Date: Wed, 25 Apr 2012 21:55:50 +0100
- To: Deborah Dahl <dahl@conversational-technologies.com>
- Cc: "Young, Milan" <Milan.Young@nuance.com>, Glen Shires <gshires@google.com>, Hans Wennborg <hwennborg@google.com>, public-speech-api@w3.org
- Message-ID: <CAHZf7RkfKN-vMP0xQbFPmN9jhXWveaTZZSMymH9y7m8rUHMLcA@mail.gmail.com>
A performance conscious web app could use a smaller maxNBest value so the recognizer doesn't generate all the hypotheses and then decide to throw away some of them based on the confidence value. I don't see value in providing both in this API. I agree all the parameters suggested have been in use over the years but we should carefully consider each one in context of web apps. Can we discuss a list of web apps which will require both maxNBest and confidenceThreshold and see how to address them? In fact it would be great to adopt that as a framework for all API discussions, i.e. provide sample web app scenarios where the change/addition will be required. Cheers Satish On Wed, Apr 25, 2012 at 9:40 PM, Deborah Dahl < dahl@conversational-technologies.com> wrote: > Yes, the filtering that I was talking about is something that you would do > at the level of the interaction manager, whether that’s implemented on the > client, or whether it’s running on another server. It doesn’t mean that > there isn’t a value in also doing some filtering on the server, for the > reasons Milan pointed out.**** > > ** ** > > *From:* Young, Milan [mailto:Milan.Young@nuance.com] > *Sent:* Wednesday, April 25, 2012 2:08 PM > *To:* Glen Shires > > *Cc:* Hans Wennborg; Satish S; public-speech-api@w3.org > *Subject:* RE: Additional parameters to SpeechRecognition (was "Speech > API: first editor's draft posted")**** > > ** ** > > You have ignored my two points about why it is often best to filter low > confidence matches on the server (ie performance and clipping). Just > because Deborah points out that there are additional use cases for > filtering on the client does not invalidate my claim.**** > > ** ** > > Yes, we should try to deliver consistent behavior across UAs, speech > engines, and even dialog states. But let’s not throw the baby out with the > bathwater if we can’t nail it down in a v1.**** > > ** ** > > ** ** > > *From:* Glen Shires [mailto:gshires@google.com] > *Sent:* Wednesday, April 25, 2012 10:43 AM > *To:* Young, Milan > *Cc:* Hans Wennborg; Satish S; public-speech-api@w3.org > *Subject:* Re: Additional parameters to SpeechRecognition (was "Speech > API: first editor's draft posted")**** > > ** ** > > I think (hope) that most web developers won't have to worry about > confidence values because the default set by the speech recognizer should > be sufficient.**** > > ** ** > > However, a JS API developer savvy enough to understand how/when to > properly set a confidenceThreshold, is also savvy enough to intelligently > process the confidence values returned in the results. As Deborah mentioned > [1], "For example, if the top two alternatives in the nbest have very > similar confidences...". Typically, processing the confidence result > values is a much better strategy than trying to tune the > confidenceThreshold.**** > > ** ** > > Only extremely savvy JS API developers will understand how to properly > tune the confidenceThreshold so that it prunes (but doesn't over prune) the > data returned. I believe these developers can best adjust > the confidenceThreshold by processing the confidence result values returned > by prior recognitions (as opposed to simply bumping the default value by > 0.05). **** > > ** ** > > ** ** > > Also, from an implementation standpoint, there's a major issue with > making confidenceThreshold readable. If the developer switches to a new > recognizer, the default confidenceThreshold may change. If the developer > then reads the confidenceThreshold (for example, to increment it by 0.05), > then presumably the browser needs to get the default confidence value from > the speech recognizer. For a remote recognizer, this round-trip takes time, > and the browser cannot stall the javascript processing.**** > > ** ** > > /Glen Shires**** > > ** ** > > [1] > http://lists.w3.org/Archives/Public/public-speech-api/2012Apr/0031.html*** > * > > On Wed, Apr 25, 2012 at 9:47 AM, Young, Milan <Milan.Young@nuance.com> > wrote:**** > > The speech community has lived for 20 years with the fact that confidence > values are not portable across engines. I understand that we are courting > a new class of developers with this HTML-based initiative, but I want to be > careful not to dumb it down to the point where we impact the mainstream > speech industry.**** > > **** > > Incrementally bumping up confidence (eg recognizer.confidence += 5) in > response to a series of misrecognitions is a common technique. I also find > it generally ugly that confidence is special cased with a function instead > of a property. (Is it a JS limitation that you cannot mark a property as > write only?)**** > > **** > > I would rather say something like “Recognition engines generally do a good > job of choosing the right confidence value for a recognition task. If you > do choose to read this property, know that it’s value is not portable to > other recognition tasks, other speech engines, or other user agents.”**** > > **** > > Thanks**** > > **** > > *From:* Glen Shires [mailto:gshires@google.com] > *Sent:* Wednesday, April 25, 2012 8:11 AM > *To:* Hans Wennborg > *Cc:* Young, Milan; Satish S; public-speech-api@w3.org**** > > > *Subject:* Re: Additional parameters to SpeechRecognition (was "Speech > API: first editor's draft posted")**** > > **** > > confidenceThreshold**** > > **** > > I think we all agree that speech recognizers have a concept of confidence, > and that it can be mapped to a monotonically increasing range of 0.0 to > 1.0. However, specific values (for example 0.5) do not correspond to the > same level of confidence for different recognizers.**** > > **** > > I believe that if the developer does not set the confidenceThreshold, the > speech recognizer should use a default value that is appropriate for that > recognizer.**** > > **** > > A complication with a confidenceThreshold attribute is defining the > default value (if the value is read, but not written, what value does the > BROWSER return? - particularly because the optimal default value may vary > from one RECOGNIZER to another).**** > > **** > > Perhaps instead of an attribute, this should be a write-only value, > specifically a setConfidenceThreshold method.**** > > **** > > /Glen Shires**** > > On Wed, Apr 25, 2012 at 6:43 AM, Hans Wennborg <hwennborg@google.com> > wrote:**** > > On Tue, Apr 24, 2012 at 17:22, Young, Milan <Milan.Young@nuance.com> > wrote: > > There are two reasons for including confidence that I would like this > community to consider: > > Efficiency - Similar to the argument Satish put forward for limiting > the size of the nbest array, pruning the result candidates at the server is > more efficient. > > Clipping - There are many environments where background noise and side > speech that can trigger junk results. If confidence is low, this will > trigger a result and then the application enters a deaf period where it > processes the result and discovers the content is junk. If real speech > happens during this phase, its start will be missed. > > > > Every recognizer that was ever invented has a concept of confidence. > Yes, the semantics of that value vary across platforms, but for us to push > this to a custom parameter will confuse developers, and ultimately slow > adoption.**** > > Ok, I don't feel strongly about this, so I would be fine adding a > confidenceThreshold if others agree.**** > > > > Regarding the timeout family, an open-ended dialog like "Tell me what is > wrong with your computer", should have generous timeouts. Compare this to > "So it's something to do with your new Google double mouse configuration, > is that correct?" which should have short timeouts. > > > > Our goal should be a consistent application experience across UAs, and > that's only going to happen if we standardize timeouts. I would also like > to mention that the definition of these timeouts is clear and has been > industry standard for 10+ years.**** > > What do you think about my idea of just letting the web page handle > the timeout itself, calling abort() when it decides a request is > taking too long? > > > Thanks, > Hans**** > > > > **** > > **** > > -- > Thanks!**** > > Glen Shires**** > > **** > > > > **** > > ** ** > > -- > Thanks!**** > > Glen Shires**** > > ** ** >
Received on Wednesday, 25 April 2012 20:56:20 UTC