- From: Glen Shires <gshires@google.com>
- Date: Fri, 4 May 2012 10:10:45 -0700
- To: "Young, Milan" <Milan.Young@nuance.com>
- Cc: Satish S <satish@google.com>, Jerry Carter <jerry@jerrycarter.org>, "public-speech-api@w3.org" <public-speech-api@w3.org>
- Message-ID: <CAEE5bcgG__sGRSSRvopyDj8rLLjYps11wiYbpbUQ57PRvgKKVA@mail.gmail.com>
> Does anyone on this forum plan to run the recognition on the client? Whether or not anyone "on this forum" does, I believe we should consider the implications of both client-side and server-side recognition implementations of this API, because there exist many client-side as well as server-side implementations of speech recognition engines. The goal is to create a standard API that can be widely adopted and implemented. Glen Shires On Fri, May 4, 2012 at 9:43 AM, Young, Milan <Milan.Young@nuance.com> wrote: > Hello Statish,**** > > ** ** > > I believe my “no harm” comment was taken out of context. The point was > that confidence is a mainstream concept for the speech industry, and it’s > hard to see how those outside of the mainstream are going to be harmed by > its inclusion.**** > > ** ** > > Regarding nbest vs confidence: Hotword recognitions usually contain only > a single phrase. Telling the recognizer that you only want a single result > is a no-op. The speech industry understood this point 20 years ago, and > that’s why we have separate parameters. **** > > ** ** > > Two points regarding server optimization:**** > > **- **Does anyone on this forum plan to run the recognition on > the client? **** > > **- **If we were talking about significant changes to the > architecture, I agree that performance might take a backseat. But this is > just a single parameter, so it’s hard to see the value of this line of > reasoning.**** > > ** ** > > Thanks**** > > ** ** > > ** ** > > ** ** > > *From:* Satish S [mailto:satish@google.com] > *Sent:* Friday, May 04, 2012 8:40 AM > *To:* Young, Milan > *Cc:* Jerry Carter; public-speech-api@w3.org > > *Subject:* Re: Additional parameters to SpeechRecognition (was "Speech > API: first editor's draft posted")**** > > ** ** > > But “confidence” is a much easier to understand concept, and I don’t see > any harm to the average web developer by including it in the list.**** > > ** ** > > FWIW, that shouldn't be the bar to include items in the API. Since web > APIs are supported perpetually in practice we should start with the most > basic set and iterate based on concrete application requirements.**** > > ** ** > > One example is “hotword” recognition, which might be used to wake up the > application after long periods of silence, side speech, noise, etc. The > hotword grammar is often very simple (eg “wake up”), and thus multiple > interpretations are extremely uncommon. Developers would use “confidence” > to avoid false positives which consume processing resources and induce the > deaf periods I mentioned before.**** > > ** ** > > I can see the same use case addressed by setting maxNBest=1 so that only > the topmost interpretation is returned and the engine optimises resources > for that.**** > > ** ** > > I am also wondering if optimising for server side performance should even > be a consideration when designing the web speech API. Developing a simple > web developer facing API is our explicit goal and optimisation is something > that implementors of both UAs and speech engines would do based on a lot of > parameters, hence the API should not really care about it.**** > > ** ** > > I’m not sure what it means in practice to not define a confidenceThreshold > (option 4). Doesn’t it just mean that recognizer behavior is > implementation-specific, and isn’t that equivalent to option (2)? Isn’t (4) > subject to the same problems when changing recognizers as (2)?**** > > ** ** > > Yes I think (2) and (4) are the same because the actual custom parameters > aren't going to be defined in the spec. **** >
Received on Friday, 4 May 2012 17:11:57 UTC