- From: Bjorn Bringert <bringert@google.com>
- Date: Thu, 19 May 2011 16:51:30 +0100
- To: Olli@pettay.fi
- Cc: public-xg-htmlspeech@w3.org
On Thu, May 19, 2011 at 4:39 PM, Olli Pettay <Olli.Pettay@helsinki.fi> wrote: > On 05/19/2011 05:59 PM, Bjorn Bringert wrote: >> >> By now the draft final report >> (http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech.html) >> contains a number of design agreements for the JavaScript API for >> speech recognition. I thought it would be a useful exercise to >> translate those agreements into a concrete API. >> >> The below IDL describes my interpretation of the parts of the API that >> we have agreed on so far. Many of the interface/function/attribute >> names are not yet agreed, so I mixed and matched from the Microsoft, >> Mozilla and Google proposals. >> >> interface SpeechInputRequest { >> // URL (http: or data:) for an SRGS XML document, with or without SISR >> tags, >> // or a URI for one of the predefined grammars >> attribute DOMString grammar; > > I think we need to support either multiple simultaneous grammars or > SIRs. MS has GrammarCollection, so it supports multiple grammars, > SpeechRequest API support multiple active recognition objects. Yeah, this is a known area for discussion. I only put in the single field, since we all agree that we need to support at least one grammar :-) >> // Recognition language. Language declared in grammar overrides this. >> attribute DOMString lang; > > I wonder still how to handle language in a don't-leak-privacy-data way. > There are very good use cases for lang, but the privacy problem should be > solved. > > >> // URL for speech recognition engine, http: must be supported >> attribute DOMString engine; >> >> // Not yet discussed I think, but Google and Microsoft proposals have >> it >> attribute long maxresults; > > Very reasonable. > >> >> // Some timeout parameters will likely be agreed, not yet discussed > > ditto > >> >> // Starts capturing audio and recognizing speech >> void startSpeechInput(); >> // Stops capturing audio and lets speech recognition complete >> void stopSpeechInput(); >> // Stops capuring audio and aborts speech recognition >> void cancelSpeechInput(); >> >> attribute Function onaudiostart; >> attribute Function onsoundstart; >> attribute Function onspeechstart; >> attribute Function onspeechend; >> attribute Function onsoundend; >> attribute Function onaudioend; >> attribute Function onresult; >> attribute Function onerror; >> }; >> SpeechInputRequest implements EventTarget; >> >> Events: >> >> audiostart, interface: Event: Audio capture has started >> soundstart, interface: Event: Some sound, possibly speech, has been >> detected (low latency) >> speechstart, interface: Event: Speech start has been detected >> speechend, interface: Event: Speech end has been detected (hmm, can we >> really guarantee that this comes before soundend if the latter is a >> client endpointer) >> soundend, interface: Event: Sound end has been detected >> audioend, interface: Event: Audio capture has finished >> result, interface: SpeechResultEvent: Speech recognizer has returned a >> final result with at least one recognition hypothesis >> error, interface: Event (?): Speech end has been detected >> >> // The event passed to the 'result' event handlers >> interface SpeechResultEvent : Event { >> readonly attribute SpeechInputResult result; >> }; >> >> // Recognition result as EMMA + simple N-best list >> interface SpeechInputResult { >> readonly attribute Document resultEMMAXML; >> readonly attribute DOMString resultEMMAText; >> readonly attribute unsigned long length; >> getter SpeechInputResultAlternative item(in unsigned long index); >> }; >> >> // Item in N-best list >> interface SpeechInputAlternative { >> readonly attribute DOMString utterance; >> readonly attribute float confidence; >> readonly attribute any interpretation; >> }; >> >> >> The HTML interface has not been agreed yet. The Mozilla proposal has >> none. The Microsoft proposal has a<reco> element as a child of >> <input> and other elements, or associated with elements using @for. >> Google has @speech attribute for<input> elements. If we agree on >> speech recognition element(s), the SpeechInputRequest interface should >> either be able to serve as the DOM interface for such elements, or the >> elements could have an attribute which contains a SpeechInputRequest. > > My proposal has the boundElement which has some similarity to MS' @for, but > sure, it doesn't have any element for the asr/reco, only JS object. > > > > -Olli > -- Bjorn Bringert Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Thursday, 19 May 2011 15:51:56 UTC