Additional parameters to SpeechRecognition (was "Speech API: first editor's draft posted") from Satish S on 2012-04-24 (public-speech-api@w3.org from April 2012)

From: Satish S <satish@google.com>
Date: Tue, 24 Apr 2012 14:52:14 +0100
To: "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <CAHZf7Rn5DuiO=sWkJ_0uW8ui3WZgWu4X4tHaTTw2890xaSeA2g@mail.gmail.com>
(Splitting off to a new thread so we can follow discussions easily.
Please start a new threads for proposed additions/changes)

> SpeechRecognition
>  - In addition to the three parameters you have listed, I see the following as necessary:
>        integer maxNBest;

I can see speech engines defaulting to a specific number of results
and the web app can tweak it based on performance characteristics it
needs. Without this attribute the engine should be asked to always
give the max number of results and let the web app filter, which seems
suboptimal.

>        float confidenceThreshold;

SpeechRecognitionAlternative.confidence provides the value so the web
app can filter based on that if it needs to. With that in mind do we
need this attribute?

>        integer completeTimeout;
>        integer incompleteTimeout;
>        integer maxSpeechTimeout;

Do you have use cases where these should vary between different web
apps? I think it would be better to leave it to the UA so all web apps
have consistent timeouts and user expectation doesn't get affected.

>        attribute DOMString serviceURI;

Is the idea to have this attribute and let the UA decide what protocol
to speak to the service?

Cheers
Satish


On Fri, Apr 13, 2012 at 10:05 PM, Young, Milan <Milan.Young@nuance.com> wrote:
>
> Thank you for the draft, this looks like an excellent start.  A few comments/suggestions on the following:
>
> SpeechRecognition
>  - In addition to the three parameters you have listed, I see the following as necessary:
>        integer maxNBest;
>        float confidenceThreshold;
>        integer completeTimeout;
>        integer incompleteTimeout;
>        integer maxSpeechTimeout;
>        attribute DOMString serviceURI;
>
> - We'll also need an interface for setting non-standard parameters.  This will be critical to avoid rat-holing into a complete list of parameters.
>        SpeechParameterList parameters;
>        void setCustomParameter(in DOMString name, in DOMString value);
>
>    interface SpeechParameter {
>        attribute DOMString name;
>        attribute DOMString value;
>    };
>
>    interface SpeechParameterList {
>        readonly attribute unsigned long length;
>        getter SpeechParameter item(in unsigned long index);
>    };
>
> - I prefer a flatter structure for SpeechRecogntion.  Part of doing that would involve splitting the error path out to its own event.  I suggest the following:
>
>    // A full response, which could be interim or final, part of a continuous response or not
>    interface SpeechRecognitionResult : RecognitionEvent {
>        readonly attribute unsigned long length;
>        getter SpeechRecognitionAlternative item(in unsigned long index);
>        readonly attribute boolean final;
>        readonly attribute short resultIndex;
>        readonly attribute SpeechRecognitionResultList resultHistory;
>    };
>
>    interface SpeechRecognitionError : RecognitionEvent {
>      // As before
>    };
>
>
>
> TTS
>  - At a minimum, we'll need the same serviceURI parameter and generic parameter interface as in SpeechRecognition.
>  - I'd also like to hear some discussion on the importance of "marking" the stream.  I personally feel this is common enough that I should be part of a v1.
>
>
> Thanks
>
>
> -----Original Message-----
> From: Hans Wennborg [mailto:hwennborg@google.com]
> Sent: Thursday, April 12, 2012 7:36 AM
> To: public-speech-api@w3.org
> Cc: Satish S; Glen Shires
> Subject: Speech API: first editor's draft posted
>
> In December, Google proposed [1] to public-webapps a Speech JavaScript API that subset supports the majority of the use-cases in the Speech Incubator Group's Final Report. This proposal provides a programmatic API that enables web-pages to synthesize speech output and to use speech recognition as an input for forms, continuous dictation and control.
>
> We have now posted in the Speech-API Community Group's repository, a slightly updated proposal [2], the differences include:
>
>  - Document is now self-contained, rather than having multiple references to the XG Final Report.
>  - Renamed SpeechReco interface to SpeechRecognition
>  - Renamed interfaces and attributes beginning SpeechInput* to
> SpeechRecognition*
>  - Moved EventTarget to constructor of SpeechRecognition
>  - Clarified that grammars and lang are attributes of SpeechRecognition
>  - Clarified that if index is greater than or equal to length, returns null
>
> We welcome discussion and feedback on this editor's draft. Please send your comments to the public-speech-api@w3.org mailing list.
>
> Glen Shires
> Hans Wennborg
>
> [1] http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/1696.html
> [2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
>
Received on Tuesday, 24 April 2012 13:52:49 UTC