- From: Bjorn Bringert <bringert@google.com>
- Date: Thu, 19 May 2011 16:51:30 +0100
- To: Olli@pettay.fi
- Cc: public-xg-htmlspeech@w3.org
On Thu, May 19, 2011 at 4:39 PM, Olli Pettay <Olli.Pettay@helsinki.fi> wrote:
> On 05/19/2011 05:59 PM, Bjorn Bringert wrote:
>>
>> By now the draft final report
>> (http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech.html)
>> contains a number of design agreements for the JavaScript API for
>> speech recognition. I thought it would be a useful exercise to
>> translate those agreements into a concrete API.
>>
>> The below IDL describes my interpretation of the parts of the API that
>> we have agreed on so far. Many of the interface/function/attribute
>> names are not yet agreed, so I mixed and matched from the Microsoft,
>> Mozilla and Google proposals.
>>
>> interface SpeechInputRequest {
>> // URL (http: or data:) for an SRGS XML document, with or without SISR
>> tags,
>> // or a URI for one of the predefined grammars
>> attribute DOMString grammar;
>
> I think we need to support either multiple simultaneous grammars or
> SIRs. MS has GrammarCollection, so it supports multiple grammars,
> SpeechRequest API support multiple active recognition objects.
Yeah, this is a known area for discussion. I only put in the single
field, since we all agree that we need to support at least one grammar
:-)
>> // Recognition language. Language declared in grammar overrides this.
>> attribute DOMString lang;
>
> I wonder still how to handle language in a don't-leak-privacy-data way.
> There are very good use cases for lang, but the privacy problem should be
> solved.
>
>
>> // URL for speech recognition engine, http: must be supported
>> attribute DOMString engine;
>>
>> // Not yet discussed I think, but Google and Microsoft proposals have
>> it
>> attribute long maxresults;
>
> Very reasonable.
>
>>
>> // Some timeout parameters will likely be agreed, not yet discussed
>
> ditto
>
>>
>> // Starts capturing audio and recognizing speech
>> void startSpeechInput();
>> // Stops capturing audio and lets speech recognition complete
>> void stopSpeechInput();
>> // Stops capuring audio and aborts speech recognition
>> void cancelSpeechInput();
>>
>> attribute Function onaudiostart;
>> attribute Function onsoundstart;
>> attribute Function onspeechstart;
>> attribute Function onspeechend;
>> attribute Function onsoundend;
>> attribute Function onaudioend;
>> attribute Function onresult;
>> attribute Function onerror;
>> };
>> SpeechInputRequest implements EventTarget;
>>
>> Events:
>>
>> audiostart, interface: Event: Audio capture has started
>> soundstart, interface: Event: Some sound, possibly speech, has been
>> detected (low latency)
>> speechstart, interface: Event: Speech start has been detected
>> speechend, interface: Event: Speech end has been detected (hmm, can we
>> really guarantee that this comes before soundend if the latter is a
>> client endpointer)
>> soundend, interface: Event: Sound end has been detected
>> audioend, interface: Event: Audio capture has finished
>> result, interface: SpeechResultEvent: Speech recognizer has returned a
>> final result with at least one recognition hypothesis
>> error, interface: Event (?): Speech end has been detected
>>
>> // The event passed to the 'result' event handlers
>> interface SpeechResultEvent : Event {
>> readonly attribute SpeechInputResult result;
>> };
>>
>> // Recognition result as EMMA + simple N-best list
>> interface SpeechInputResult {
>> readonly attribute Document resultEMMAXML;
>> readonly attribute DOMString resultEMMAText;
>> readonly attribute unsigned long length;
>> getter SpeechInputResultAlternative item(in unsigned long index);
>> };
>>
>> // Item in N-best list
>> interface SpeechInputAlternative {
>> readonly attribute DOMString utterance;
>> readonly attribute float confidence;
>> readonly attribute any interpretation;
>> };
>>
>>
>> The HTML interface has not been agreed yet. The Mozilla proposal has
>> none. The Microsoft proposal has a<reco> element as a child of
>> <input> and other elements, or associated with elements using @for.
>> Google has @speech attribute for<input> elements. If we agree on
>> speech recognition element(s), the SpeechInputRequest interface should
>> either be able to serve as the DOM interface for such elements, or the
>> elements could have an attribute which contains a SpeechInputRequest.
>
> My proposal has the boundElement which has some similarity to MS' @for, but
> sure, it doesn't have any element for the asr/reco, only JS object.
>
>
>
> -Olli
>
--
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902
Received on Thursday, 19 May 2011 15:51:56 UTC