- From: Bjorn Bringert <bringert@google.com>
- Date: Thu, 30 Jun 2011 12:57:38 +0100
- To: public-xg-htmlspeech@w3.org
Here is a first draft of IDL and semantics for the speech recognition events in the web app API. This does not include the events needed for continuous recognition, as that is part of Debbie's work item. == IDL == interface SpeechInputRequest { // ... other speech recognition functionality ... attribute Function onaudiostart; attribute Function onsoundstart; attribute Function onspeechstart; attribute Function onspeechend; attribute Function onsoundend; attribute Function onaudioend; attribute Function onresult; attribute Function onerror; }; SpeechInputRequest implements EventTarget; interface SpeechInputResultEvent : Event { readonly attribute SpeechInputResult result; }; interface SpeechInputErrorEvent : Event { readonly attribute SpeechInputError error; }; interface SpeechInputError { const unsigned short SPEECH_INPUT_ERR_OTHER = 0; // The following code have not been agreed. I include them anyway to have a list to start with. // This is roughly a union of the error code sets from the Microsoft and Google proposals. // Speech was detected, but it could not be recognized. const unsigned short SPEECH_INPUT_ERR_NO_MATCH = 1; // No speech was detected. const unsigned short SPEECH_INPUT_ERR_NO_SPEECH = 2; // Speech input was aborted by calling cancel(), or by some UA-specific behavior such as // UI that lets the user cancel speech input. const unsigned short SPEECH_INPUT_ERR_ABORTED = 3; // Audio capture failed. const unsigned short SPEECH_INPUT_ERR_AUDIO_CAPTURE = 4; // Some network communication that was required to complete the recognition failed. const unsigned short SPEECH_INPUT_ERR_NETWORK = 5; // The user agent is not allowing any speech input to occur for reasons of security, privacy or user preference. const unsigned short SPEECH_INPUT_ERR_NOT_ALLOWED = 6; // The user agent is not allowing the web application requested speech service, but would allow some speech service, // to be used either because the user agent doesn't support the selected one or because of reasons of security, privacy // or user preference. const unsigned short SPEECH_INPUT_ERR_SERVICE_NOT_ALLOWED = 7; // There was an error in the speech recognition grammar. const unsigned short SPEECH_INPUT_ERR_BAD_GRAMMAR = 8; const unsigned short SPEECH_INPUT_ERR_LANGUAGE_NOT_SUPPORTED = 9; // One of the constants above. readonly attribute unsigned short code; // The message attribute must return an error message describing the details of the error encountered. // The message content is implementation specific. This attribute is primarily intended for debugging and // developers should not use it directly in their application user interface. readonly attribute DOMString message; }; interface SpeechInputResult { // Debbie's work item }; == Description == The DOM Level 2 Event Model is used for speech recognition events. The methods in the EventTarget interface should be used for registering event listeners. The SpeechInputRequest interface also contains convenience attributes for registering a single event handler for each event type. For all these events, the timeStamp attribute defined in the DOM Level 2 Event interface must be set to the best possible estimate of when the real-world event which the event object represents occurred. Unless specified below, the ordering of the different events is undefined. For example, some implementations may fire audioend before speechstart or speechend if the audio detector is client-side and the speech detector is server-side. == List of events == For each event, we list the name, the interface of the event object, and a description. audiostart, interface: Event Fired when the user agent has started to capture audio. soundstart, interface: Event Some sound, possibly speech, has been detected. This must be fired with low latency, e.g. by using a client-side energy detector. speechstart, interface: Event The speech that will be used for speech recognition has started. speechend, interface: Event The speech that will be used for speech recognition has ended. speechstart must always have been fire before speechend. soundend, interface: Event Some sound is no longer detected. This must be fired with low latency, e.g. by using a client-side energy detector. soundstart must always have been fired before soundend. audioend, interface: Event Fired when the user agent has finished capturing audio. audiostart must always have been fired before audioend. result, interface: SpeechInputResultEvent Fired when the speech recognizer returns a final result with at least one recognition hypothesis. The result field in the event contains the speech recognition result. All the following events must have been fired before result is fired: audiostart, soundstart, speechstart, speechend, soundend, audioend. error, interface: SpeechInputErrorEvent Fired when a speech recognition error occurs. The error attribute is set to a SpeechInputError object. After an error event is fired, no further events will be fired for the given speech input request. -- Bjorn Bringert Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Thursday, 30 June 2011 11:58:01 UTC