- From: Olli Pettay <Olli.Pettay@helsinki.fi>
- Date: Thu, 19 May 2011 18:39:56 +0300
- To: Bjorn Bringert <bringert@google.com>
- CC: public-xg-htmlspeech@w3.org
On 05/19/2011 05:59 PM, Bjorn Bringert wrote: > By now the draft final report > (http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech.html) > contains a number of design agreements for the JavaScript API for > speech recognition. I thought it would be a useful exercise to > translate those agreements into a concrete API. > > The below IDL describes my interpretation of the parts of the API that > we have agreed on so far. Many of the interface/function/attribute > names are not yet agreed, so I mixed and matched from the Microsoft, > Mozilla and Google proposals. > > interface SpeechInputRequest { > // URL (http: or data:) for an SRGS XML document, with or without SISR tags, > // or a URI for one of the predefined grammars > attribute DOMString grammar; I think we need to support either multiple simultaneous grammars or SIRs. MS has GrammarCollection, so it supports multiple grammars, SpeechRequest API support multiple active recognition objects. > // Recognition language. Language declared in grammar overrides this. > attribute DOMString lang; I wonder still how to handle language in a don't-leak-privacy-data way. There are very good use cases for lang, but the privacy problem should be solved. > // URL for speech recognition engine, http: must be supported > attribute DOMString engine; > > // Not yet discussed I think, but Google and Microsoft proposals have it > attribute long maxresults; Very reasonable. > > // Some timeout parameters will likely be agreed, not yet discussed ditto > > // Starts capturing audio and recognizing speech > void startSpeechInput(); > // Stops capturing audio and lets speech recognition complete > void stopSpeechInput(); > // Stops capuring audio and aborts speech recognition > void cancelSpeechInput(); > > attribute Function onaudiostart; > attribute Function onsoundstart; > attribute Function onspeechstart; > attribute Function onspeechend; > attribute Function onsoundend; > attribute Function onaudioend; > attribute Function onresult; > attribute Function onerror; > }; > SpeechInputRequest implements EventTarget; > > Events: > > audiostart, interface: Event: Audio capture has started > soundstart, interface: Event: Some sound, possibly speech, has been > detected (low latency) > speechstart, interface: Event: Speech start has been detected > speechend, interface: Event: Speech end has been detected (hmm, can we > really guarantee that this comes before soundend if the latter is a > client endpointer) > soundend, interface: Event: Sound end has been detected > audioend, interface: Event: Audio capture has finished > result, interface: SpeechResultEvent: Speech recognizer has returned a > final result with at least one recognition hypothesis > error, interface: Event (?): Speech end has been detected > > // The event passed to the 'result' event handlers > interface SpeechResultEvent : Event { > readonly attribute SpeechInputResult result; > }; > > // Recognition result as EMMA + simple N-best list > interface SpeechInputResult { > readonly attribute Document resultEMMAXML; > readonly attribute DOMString resultEMMAText; > readonly attribute unsigned long length; > getter SpeechInputResultAlternative item(in unsigned long index); > }; > > // Item in N-best list > interface SpeechInputAlternative { > readonly attribute DOMString utterance; > readonly attribute float confidence; > readonly attribute any interpretation; > }; > > > The HTML interface has not been agreed yet. The Mozilla proposal has > none. The Microsoft proposal has a<reco> element as a child of > <input> and other elements, or associated with elements using @for. > Google has @speech attribute for<input> elements. If we agree on > speech recognition element(s), the SpeechInputRequest interface should > either be able to serve as the DOM interface for such elements, or the > elements could have an attribute which contains a SpeechInputRequest. My proposal has the boundElement which has some similarity to MS' @for, but sure, it doesn't have any element for the asr/reco, only JS object. -Olli
Received on Thursday, 19 May 2011 15:40:24 UTC