- From: Young, Milan <Milan.Young@nuance.com>
- Date: Fri, 13 Apr 2012 21:05:35 +0000
- To: Hans Wennborg <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
- CC: Satish S <satish@google.com>, Glen Shires <gshires@google.com>
Thank you for the draft, this looks like an excellent start. A few comments/suggestions on the following: SpeechRecognition - In addition to the three parameters you have listed, I see the following as necessary: integer maxNBest; float confidenceThreshold; integer completeTimeout; integer incompleteTimeout; integer maxSpeechTimeout; attribute DOMString serviceURI; - We'll also need an interface for setting non-standard parameters. This will be critical to avoid rat-holing into a complete list of parameters. SpeechParameterList parameters; void setCustomParameter(in DOMString name, in DOMString value); interface SpeechParameter { attribute DOMString name; attribute DOMString value; }; interface SpeechParameterList { readonly attribute unsigned long length; getter SpeechParameter item(in unsigned long index); }; - I prefer a flatter structure for SpeechRecogntion. Part of doing that would involve splitting the error path out to its own event. I suggest the following: // A full response, which could be interim or final, part of a continuous response or not interface SpeechRecognitionResult : RecognitionEvent { readonly attribute unsigned long length; getter SpeechRecognitionAlternative item(in unsigned long index); readonly attribute boolean final; readonly attribute short resultIndex; readonly attribute SpeechRecognitionResultList resultHistory; }; interface SpeechRecognitionError : RecognitionEvent { // As before }; TTS - At a minimum, we'll need the same serviceURI parameter and generic parameter interface as in SpeechRecognition. - I'd also like to hear some discussion on the importance of "marking" the stream. I personally feel this is common enough that I should be part of a v1. Thanks -----Original Message----- From: Hans Wennborg [mailto:hwennborg@google.com] Sent: Thursday, April 12, 2012 7:36 AM To: public-speech-api@w3.org Cc: Satish S; Glen Shires Subject: Speech API: first editor's draft posted In December, Google proposed [1] to public-webapps a Speech JavaScript API that subset supports the majority of the use-cases in the Speech Incubator Group's Final Report. This proposal provides a programmatic API that enables web-pages to synthesize speech output and to use speech recognition as an input for forms, continuous dictation and control. We have now posted in the Speech-API Community Group's repository, a slightly updated proposal [2], the differences include: - Document is now self-contained, rather than having multiple references to the XG Final Report. - Renamed SpeechReco interface to SpeechRecognition - Renamed interfaces and attributes beginning SpeechInput* to SpeechRecognition* - Moved EventTarget to constructor of SpeechRecognition - Clarified that grammars and lang are attributes of SpeechRecognition - Clarified that if index is greater than or equal to length, returns null We welcome discussion and feedback on this editor's draft. Please send your comments to the public-speech-api@w3.org mailing list. Glen Shires Hans Wennborg [1] http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/1696.html [2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
Received on Friday, 13 April 2012 21:06:04 UTC