- From: Bjorn Bringert <bringert@google.com>
- Date: Thu, 19 May 2011 15:59:35 +0100
- To: public-xg-htmlspeech@w3.org
By now the draft final report
(http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech.html)
contains a number of design agreements for the JavaScript API for
speech recognition. I thought it would be a useful exercise to
translate those agreements into a concrete API.
The below IDL describes my interpretation of the parts of the API that
we have agreed on so far. Many of the interface/function/attribute
names are not yet agreed, so I mixed and matched from the Microsoft,
Mozilla and Google proposals.
interface SpeechInputRequest {
// URL (http: or data:) for an SRGS XML document, with or without SISR tags,
// or a URI for one of the predefined grammars
attribute DOMString grammar;
// Recognition language. Language declared in grammar overrides this.
attribute DOMString lang;
// URL for speech recognition engine, http: must be supported
attribute DOMString engine;
// Not yet discussed I think, but Google and Microsoft proposals have it
attribute long maxresults;
// Some timeout parameters will likely be agreed, not yet discussed
// Starts capturing audio and recognizing speech
void startSpeechInput();
// Stops capturing audio and lets speech recognition complete
void stopSpeechInput();
// Stops capuring audio and aborts speech recognition
void cancelSpeechInput();
attribute Function onaudiostart;
attribute Function onsoundstart;
attribute Function onspeechstart;
attribute Function onspeechend;
attribute Function onsoundend;
attribute Function onaudioend;
attribute Function onresult;
attribute Function onerror;
};
SpeechInputRequest implements EventTarget;
Events:
audiostart, interface: Event: Audio capture has started
soundstart, interface: Event: Some sound, possibly speech, has been
detected (low latency)
speechstart, interface: Event: Speech start has been detected
speechend, interface: Event: Speech end has been detected (hmm, can we
really guarantee that this comes before soundend if the latter is a
client endpointer)
soundend, interface: Event: Sound end has been detected
audioend, interface: Event: Audio capture has finished
result, interface: SpeechResultEvent: Speech recognizer has returned a
final result with at least one recognition hypothesis
error, interface: Event (?): Speech end has been detected
// The event passed to the 'result' event handlers
interface SpeechResultEvent : Event {
readonly attribute SpeechInputResult result;
};
// Recognition result as EMMA + simple N-best list
interface SpeechInputResult {
readonly attribute Document resultEMMAXML;
readonly attribute DOMString resultEMMAText;
readonly attribute unsigned long length;
getter SpeechInputResultAlternative item(in unsigned long index);
};
// Item in N-best list
interface SpeechInputAlternative {
readonly attribute DOMString utterance;
readonly attribute float confidence;
readonly attribute any interpretation;
};
The HTML interface has not been agreed yet. The Mozilla proposal has
none. The Microsoft proposal has a <reco> element as a child of
<input> and other elements, or associated with elements using @for.
Google has @speech attribute for <input> elements. If we agree on
speech recognition element(s), the SpeechInputRequest interface should
either be able to serve as the DOM interface for such elements, or the
elements could have an attribute which contains a SpeechInputRequest.
--
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902
Received on Thursday, 19 May 2011 15:00:00 UTC