- From: Bjorn Bringert <bringert@google.com>
- Date: Thu, 19 May 2011 15:59:35 +0100
- To: public-xg-htmlspeech@w3.org
By now the draft final report (http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech.html) contains a number of design agreements for the JavaScript API for speech recognition. I thought it would be a useful exercise to translate those agreements into a concrete API. The below IDL describes my interpretation of the parts of the API that we have agreed on so far. Many of the interface/function/attribute names are not yet agreed, so I mixed and matched from the Microsoft, Mozilla and Google proposals. interface SpeechInputRequest { // URL (http: or data:) for an SRGS XML document, with or without SISR tags, // or a URI for one of the predefined grammars attribute DOMString grammar; // Recognition language. Language declared in grammar overrides this. attribute DOMString lang; // URL for speech recognition engine, http: must be supported attribute DOMString engine; // Not yet discussed I think, but Google and Microsoft proposals have it attribute long maxresults; // Some timeout parameters will likely be agreed, not yet discussed // Starts capturing audio and recognizing speech void startSpeechInput(); // Stops capturing audio and lets speech recognition complete void stopSpeechInput(); // Stops capuring audio and aborts speech recognition void cancelSpeechInput(); attribute Function onaudiostart; attribute Function onsoundstart; attribute Function onspeechstart; attribute Function onspeechend; attribute Function onsoundend; attribute Function onaudioend; attribute Function onresult; attribute Function onerror; }; SpeechInputRequest implements EventTarget; Events: audiostart, interface: Event: Audio capture has started soundstart, interface: Event: Some sound, possibly speech, has been detected (low latency) speechstart, interface: Event: Speech start has been detected speechend, interface: Event: Speech end has been detected (hmm, can we really guarantee that this comes before soundend if the latter is a client endpointer) soundend, interface: Event: Sound end has been detected audioend, interface: Event: Audio capture has finished result, interface: SpeechResultEvent: Speech recognizer has returned a final result with at least one recognition hypothesis error, interface: Event (?): Speech end has been detected // The event passed to the 'result' event handlers interface SpeechResultEvent : Event { readonly attribute SpeechInputResult result; }; // Recognition result as EMMA + simple N-best list interface SpeechInputResult { readonly attribute Document resultEMMAXML; readonly attribute DOMString resultEMMAText; readonly attribute unsigned long length; getter SpeechInputResultAlternative item(in unsigned long index); }; // Item in N-best list interface SpeechInputAlternative { readonly attribute DOMString utterance; readonly attribute float confidence; readonly attribute any interpretation; }; The HTML interface has not been agreed yet. The Mozilla proposal has none. The Microsoft proposal has a <reco> element as a child of <input> and other elements, or associated with elements using @for. Google has @speech attribute for <input> elements. If we agree on speech recognition element(s), the SpeechInputRequest interface should either be able to serve as the DOM interface for such elements, or the elements could have an attribute which contains a SpeechInputRequest. -- Bjorn Bringert Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Thursday, 19 May 2011 15:00:00 UTC