Re: Additional parameters to SpeechRecognition (was "Speech API: first editor's draft posted")

> Does anyone on this forum plan to run the recognition on the client?

Whether or not anyone "on this forum" does, I believe we should consider
the implications of both client-side and server-side recognition
implementations of this API, because there exist many client-side as well
as server-side implementations of speech recognition engines. The goal is
to create a standard API that can be widely adopted and implemented.

Glen Shires

On Fri, May 4, 2012 at 9:43 AM, Young, Milan <Milan.Young@nuance.com> wrote:

>  Hello Statish,****
>
> ** **
>
> I believe my “no harm” comment was taken out of context.  The point was
> that confidence is a mainstream concept for the speech industry, and it’s
> hard to see how those outside of the mainstream are going to be harmed by
> its inclusion.****
>
> ** **
>
> Regarding nbest vs confidence:  Hotword recognitions usually contain only
> a single phrase.  Telling the recognizer that you only want a single result
> is a no-op.  The speech industry understood this point 20 years ago, and
> that’s why we have separate parameters. ****
>
> ** **
>
> Two points regarding server optimization:****
>
> **-          **Does anyone on this forum plan to run the recognition on
> the client?  ****
>
> **-          **If we were talking about significant changes to the
> architecture, I agree that performance might take a backseat.  But this is
> just a single parameter, so it’s hard to see the value of this line of
> reasoning.****
>
> ** **
>
> Thanks****
>
> ** **
>
> ** **
>
> ** **
>
> *From:* Satish S [mailto:satish@google.com]
> *Sent:* Friday, May 04, 2012 8:40 AM
> *To:* Young, Milan
> *Cc:* Jerry Carter; public-speech-api@w3.org
>
> *Subject:* Re: Additional parameters to SpeechRecognition (was "Speech
> API: first editor's draft posted")****
>
> ** **
>
>  But “confidence” is a much easier to understand concept, and I don’t see
> any harm to the average web developer by including it in the list.****
>
>  ** **
>
> FWIW, that shouldn't be the bar to include items in the API. Since web
> APIs are supported perpetually in practice we should start with the most
> basic set and iterate based on concrete application requirements.****
>
> ** **
>
> One example is “hotword” recognition, which might be used to wake up the
> application after long periods of silence, side speech, noise, etc.  The
> hotword grammar is often very simple (eg “wake up”), and thus multiple
> interpretations are extremely uncommon.  Developers would use “confidence”
> to avoid false positives which consume processing resources and induce the
> deaf periods I mentioned before.****
>
>  ** **
>
> I can see the same use case addressed by setting maxNBest=1 so that only
> the topmost interpretation is returned and the engine optimises resources
> for that.****
>
> ** **
>
> I am also wondering if optimising for server side performance should even
> be a consideration when designing the web speech API. Developing a simple
> web developer facing API is our explicit goal and optimisation is something
> that implementors of both UAs and speech engines would do based on a lot of
> parameters, hence the API should not really care about it.****
>
> ** **
>
> I’m not sure what it means in practice to not define a confidenceThreshold
> (option 4). Doesn’t it just mean that recognizer behavior is
> implementation-specific, and isn’t that equivalent to option (2)? Isn’t (4)
> subject to the same problems when changing recognizers as (2)?****
>
>  ** **
>
>  Yes I think (2) and (4) are the same because the actual custom parameters
> aren't going to be defined in the spec. ****
>

Received on Friday, 4 May 2012 17:11:57 UTC