Confidence property from Young, Milan on 2012-05-18 (public-speech-api@w3.org from May 2012)

From: Young, Milan <Milan.Young@nuance.com>
Date: Fri, 18 May 2012 20:13:04 +0000
To: "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C1A45B964@SOM-EXCH04.nuance.com>

I suggest we add the following to the top-level interface in section 5.1:
    attribute float confidence;

Section 5.1.1 will need also need:
    confidence attribute - Confidence is a measure of assurity that the attributes of the SpeechRecognitionResult are correct.  This property defines a threshold for rejecting or ignoring utterances that are less than the specified threshold.  The value of the property ranges from 0.0 (least confident) to 1.0 (most confident).   Unlike maxnbest, there is no concrete mapping between the value of the threshold and how many results will be returned.  Results may vary across UAs, recognition engines, and even task to task.  The only guarantees are: 1) Larger confidence thresholds will return an equal or fewer number of results than lower thresholds, and 2) Any confidence score reported within the SpeechRecogntionResult (e.g. within an EMMA structure) will use the same [0.0-1.0] scale.


Summarizing previous discussion, we have:
  Pros:  1) Aids efficient application design, 2) minimizes deaf periods, 3) avoids a proliferation of semi-standard custom parameters.
  Cons: 1) Semantics of the value are not precisely defined, and 2) Novice users may not understand how confidence differs from maxnbest.

My responses to the cons are: 1) Precedent from the speech industry, and 2) Thousands of VoiceXML developers do understand the difference and will balk at an API that does not accommodate their needs.

Thanks

Received on Friday, 18 May 2012 20:13:34 UTC