- From: Young, Milan <Milan.Young@nuance.com>
- Date: Fri, 13 Apr 2012 21:05:35 +0000
- To: Hans Wennborg <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
- CC: Satish S <satish@google.com>, Glen Shires <gshires@google.com>
Thank you for the draft, this looks like an excellent start. A few comments/suggestions on the following:
SpeechRecognition
- In addition to the three parameters you have listed, I see the following as necessary:
integer maxNBest;
float confidenceThreshold;
integer completeTimeout;
integer incompleteTimeout;
integer maxSpeechTimeout;
attribute DOMString serviceURI;
- We'll also need an interface for setting non-standard parameters. This will be critical to avoid rat-holing into a complete list of parameters.
SpeechParameterList parameters;
void setCustomParameter(in DOMString name, in DOMString value);
interface SpeechParameter {
attribute DOMString name;
attribute DOMString value;
};
interface SpeechParameterList {
readonly attribute unsigned long length;
getter SpeechParameter item(in unsigned long index);
};
- I prefer a flatter structure for SpeechRecogntion. Part of doing that would involve splitting the error path out to its own event. I suggest the following:
// A full response, which could be interim or final, part of a continuous response or not
interface SpeechRecognitionResult : RecognitionEvent {
readonly attribute unsigned long length;
getter SpeechRecognitionAlternative item(in unsigned long index);
readonly attribute boolean final;
readonly attribute short resultIndex;
readonly attribute SpeechRecognitionResultList resultHistory;
};
interface SpeechRecognitionError : RecognitionEvent {
// As before
};
TTS
- At a minimum, we'll need the same serviceURI parameter and generic parameter interface as in SpeechRecognition.
- I'd also like to hear some discussion on the importance of "marking" the stream. I personally feel this is common enough that I should be part of a v1.
Thanks
-----Original Message-----
From: Hans Wennborg [mailto:hwennborg@google.com]
Sent: Thursday, April 12, 2012 7:36 AM
To: public-speech-api@w3.org
Cc: Satish S; Glen Shires
Subject: Speech API: first editor's draft posted
In December, Google proposed [1] to public-webapps a Speech JavaScript API that subset supports the majority of the use-cases in the Speech Incubator Group's Final Report. This proposal provides a programmatic API that enables web-pages to synthesize speech output and to use speech recognition as an input for forms, continuous dictation and control.
We have now posted in the Speech-API Community Group's repository, a slightly updated proposal [2], the differences include:
- Document is now self-contained, rather than having multiple references to the XG Final Report.
- Renamed SpeechReco interface to SpeechRecognition
- Renamed interfaces and attributes beginning SpeechInput* to
SpeechRecognition*
- Moved EventTarget to constructor of SpeechRecognition
- Clarified that grammars and lang are attributes of SpeechRecognition
- Clarified that if index is greater than or equal to length, returns null
We welcome discussion and feedback on this editor's draft. Please send your comments to the public-speech-api@w3.org mailing list.
Glen Shires
Hans Wennborg
[1] http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/1696.html
[2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
Received on Friday, 13 April 2012 21:06:04 UTC