RE: Additional parameters to SpeechRecognition (was "Speech API: first editor's draft posted") from Young, Milan on 2012-04-24 (public-speech-api@w3.org from April 2012)

From: Young, Milan <Milan.Young@nuance.com>
Date: Tue, 24 Apr 2012 16:22:56 +0000
To: Hans Wennborg <hwennborg@google.com>, Satish S <satish@google.com>
CC: "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C1A456459@SOM-EXCH04.nuance.com>

There are two reasons for including confidence that I would like this community to consider: 
  Efficiency - Similar to the argument Satish put forward for limiting the size of the nbest array, pruning the result candidates at the server is more efficient.
  Clipping - There are many environments where background noise and side speech that can trigger junk results.  If confidence is low, this will trigger a result and then the application enters a deaf period where it processes the result and discovers the content is junk.  If real speech happens during this phase, its start will be missed.

Every recognizer that was ever invented has a concept of confidence.  Yes, the semantics of that value vary across platforms, but for us to push this to a custom parameter will confuse developers, and ultimately slow adoption.

Regarding the timeout family, an open-ended dialog like "Tell me what is wrong with your computer", should have generous timeouts.  Compare this to "So it's something to do with your new Google double mouse configuration, is that correct?" which should have short timeouts.

Our goal should be a consistent application experience across UAs, and that's only going to happen if we standardize timeouts.  I would also like to mention that the definition of these timeouts is clear and has been industry standard for 10+ years.

Thanks

-----Original Message-----
From: Hans Wennborg [mailto:hwennborg@google.com] 
Sent: Tuesday, April 24, 2012 8:25 AM
To: Satish S
Cc: public-speech-api@w3.org
Subject: Re: Additional parameters to SpeechRecognition (was "Speech API: first editor's draft posted")

On Tue, Apr 24, 2012 at 14:52, Satish S <satish@google.com> wrote:
> (Splitting off to a new thread so we can follow discussions easily.
> Please start a new threads for proposed additions/changes)
>
>> SpeechRecognition
>>  - In addition to the three parameters you have listed, I see the following as necessary:
>>        integer maxNBest;
>
> I can see speech engines defaulting to a specific number of results 
> and the web app can tweak it based on performance characteristics it 
> needs. Without this attribute the engine should be asked to always 
> give the max number of results and let the web app filter, which seems 
> suboptimal.

I agree, I think this would be a good addition.

>>        float confidenceThreshold;
>
> SpeechRecognitionAlternative.confidence provides the value so the web 
> app can filter based on that if it needs to. With that in mind do we 
> need this attribute?

Agreed. Also, the absolute confidence values are probably not very interesting. For example, what does a confidence of 0.5 mean? I see the confidence values as useful for providing an ordering of the alternatives, not much else.

>>        integer completeTimeout;
>>        integer incompleteTimeout;
>>        integer maxSpeechTimeout;
>
> Do you have use cases where these should vary between different web 
> apps? I think it would be better to leave it to the UA so all web apps 
> have consistent timeouts and user expectation doesn't get affected.

I don't like the idea of having three different timeouts. Couldn't the web page handle timeouts itself, by calling abort() on the SpeechRecognition object if it takes too long?

Thanks,
Hans

Received on Tuesday, 24 April 2012 16:23:26 UTC