- From: Satish S <satish@google.com>
- Date: Wed, 21 Sep 2011 10:26:53 +0100
- To: public-xg-htmlspeech@w3.org
- Message-ID: <CAHZf7R=a=c3RJuzn54fUa1UtEXkK5Dgtm8wpk80RLtTEO--D1g@mail.gmail.com>
Here is an IDL proposal for SpeechInputResult (currently left empty in the draft doc) and an example showing how it may be used. Specifically this tries to address the following: 1. Should work with both intermediate/preliminary and final/stable results 2. Should allow for the speech service to give alternatives so user can tap on portions of the recognized text and select a different alternative to fill in. IDL: interface SpeechInputResult { readonly attribute Hypothesis[] prelim; readonly attribute Hypothesis[] stable; readonly attribute Alternative[] alternatives; } interface Hypothesis { readonly attribute DOMString utterance; readonly attribute float confidence; // Range 0.0 - 1.0 } In case of preliminary results, only .prelim is valid and is expected to be non-empty. - preliminary results give the recognition hypotheses for speech after the last stable result (i.e. not relative to the last preliminary result) In case of final (or stable as I call here) results, all 3 attributes may be valid. - .stable is expected to be non-empty here as if it was empty the 'nomatch' event will be fired - If .prelim is non-empty, these preliminary results are for the next run (i.e. stable results were given for one part of the speech stream and the prelim results for the speech after that). - If .alternatives is non-empty, it is for the top stable result. We could technically design the API to support alternatives for every single stable hypothesis but the user is most likely to either change the whole recognized phrase to a different one or correct parts of the top result and continue to speak. Every alternative item points to one segment in the top stable result and gives the alternative hypotheses for that segment. interface Alternative { readonly attribute int start; // Index in the stable hypothesis' utterance from where the below spans start readonly attribute AlternativeSpan[] spans; } interface AlternativeSpan { readonly attribute int length; // Length of the span in the original utterance which is replaced by the below array readonly attribute float confidence; // Confidence value of the span in the original utterance, range 0.0 - 1.0 readonly attribute Hypothesis[] hypotheses; // Other hypotheses for this span in the original utterance } Example: When the user speaks "testing this example", the web app may receive the following sequence of SpeechInputResult objects in the onresult event handler. 1. { "prelim": [{ "text", 0.01 }] } 2. { "prelim": [{ “test”, 0.99 }, { “sting”, 0.01 }] } 3. { "prelim": [{ “testing”, 0.99 }, { “this”, 0.01 }] } 4. { "prelim": [{ “testing”, 0.99 }, { “this”, 0.99 }] } 5. { "stable": [{ “testing this”, 1.0 }, { “testing”, 0.1 }], // Alternatives are based on the “testing this” top scored stable result. "alternatives": [ { "start": 0, "spans": [ { "length": 4, "confidence": 0.9, "hypotheses": [{ “text”, 0.2 }, { “tent”, 0.1 }] }, { "length": 12, "confidence": 0.6, "hypotheses": [{ “exit sis”, 0.02 }] } ] }, { "start": 8, "spans": [ { "length": 4, "confidence": 0.8, "hypotheses": [{ “these”, 0.6 }] } ] } ], // Speech after “testing this” belongs to a new independent recognition run. "prelim": [{ “ex”, 0.01 } , { “apple”, 0.01 }] } 6. { "prelim": [{ “example”, 0.99 }] } 7. { "stable": [{ “example”, 1.0 }] } -- Cheers Satish
Received on Wednesday, 21 September 2011 09:27:28 UTC