Web app API: speech recognition events from Bjorn Bringert on 2011-06-30 (public-xg-htmlspeech@w3.org from June 2011)

From: Bjorn Bringert <bringert@google.com>
Date: Thu, 30 Jun 2011 12:57:38 +0100
To: public-xg-htmlspeech@w3.org
Message-ID: <CAJtyJaVP8ZX4e5t0KY=hg5kkQTGA7hs5oGoi6iPxSz=oM-p6_Q@mail.gmail.com>
Here is a first draft of IDL and semantics for the speech recognition
events in the web app API.

This does not include the events needed for continuous recognition, as
that is part of Debbie's work item.

== IDL ==

interface SpeechInputRequest {
   // ... other speech recognition functionality ...

   attribute Function onaudiostart;
   attribute Function onsoundstart;
   attribute Function onspeechstart;
   attribute Function onspeechend;
   attribute Function onsoundend;
   attribute Function onaudioend;
   attribute Function onresult;
   attribute Function onerror;
};
SpeechInputRequest implements EventTarget;

interface SpeechInputResultEvent : Event {
   readonly attribute SpeechInputResult result;
};

interface SpeechInputErrorEvent : Event {
   readonly attribute SpeechInputError error;
};

interface SpeechInputError {
  const unsigned short SPEECH_INPUT_ERR_OTHER = 0;

  // The following code have not been agreed. I include them anyway to
have a list to start with.
  // This is roughly a union of the error code sets from the Microsoft
and Google proposals.

  // Speech was detected, but it could not be recognized.
  const unsigned short SPEECH_INPUT_ERR_NO_MATCH = 1;
  // No speech was detected.
  const unsigned short SPEECH_INPUT_ERR_NO_SPEECH = 2;
  // Speech input was aborted by calling cancel(), or by some
UA-specific behavior such as
  // UI that lets the user cancel speech input.
  const unsigned short SPEECH_INPUT_ERR_ABORTED = 3;
  // Audio capture failed.
  const unsigned short SPEECH_INPUT_ERR_AUDIO_CAPTURE = 4;
  // Some network communication that was required to complete the
recognition failed.
  const unsigned short SPEECH_INPUT_ERR_NETWORK = 5;
  // The user agent is not allowing any speech input to occur for
reasons of security, privacy or user preference.
  const unsigned short SPEECH_INPUT_ERR_NOT_ALLOWED = 6;
  // The user agent is not allowing the web application requested
speech service, but would allow some speech service,
  // to be used either because the user agent doesn't support the
selected one or because of reasons of security, privacy
  // or user preference.
  const unsigned short SPEECH_INPUT_ERR_SERVICE_NOT_ALLOWED = 7;
  // There was an error in the speech recognition grammar.
  const unsigned short SPEECH_INPUT_ERR_BAD_GRAMMAR = 8;
  const unsigned short SPEECH_INPUT_ERR_LANGUAGE_NOT_SUPPORTED = 9;

  // One of the constants above.
  readonly attribute unsigned short code;
  // The message attribute must return an error message describing the
details of the error encountered.
  // The message content is implementation specific. This attribute is
primarily intended for debugging and
  // developers should not use it directly in their application user interface.
  readonly attribute DOMString message;
};

interface SpeechInputResult {
   // Debbie's work item
};


== Description ==

The DOM Level 2 Event Model is used for speech recognition events. The
methods in the EventTarget interface should be used for registering
event listeners. The SpeechInputRequest interface also contains
convenience attributes for registering a single event handler for each
event type.

For all these events, the timeStamp attribute defined in the DOM Level
2 Event interface must be set to the best possible estimate of when
the real-world event which the event object represents occurred.

Unless specified below, the ordering of the different events is
undefined. For example, some implementations may fire audioend before
speechstart or speechend if the audio detector is client-side and the
speech detector is server-side.


== List of events ==

For each event, we list the name, the interface of the event object,
and a description.

audiostart, interface: Event
Fired when the user agent has started to capture audio.

soundstart, interface: Event
Some sound, possibly speech, has been detected. This must be fired
with low latency, e.g. by using a client-side energy detector.

speechstart, interface: Event
The speech that will be used for speech recognition has started.

speechend, interface: Event
The speech that will be used for speech recognition has ended.
speechstart must always have been fire before speechend.

soundend, interface: Event
Some sound is no longer detected. This must be fired with low latency,
e.g. by using a client-side energy detector. soundstart must always
have been fired before soundend.

audioend, interface: Event
Fired when the user agent has finished capturing audio. audiostart
must always have been fired before audioend.

result, interface: SpeechInputResultEvent
Fired when the speech recognizer returns a final result with at least
one recognition hypothesis. The result field in the event contains the
speech recognition result. All the following events must have been
fired before result is fired: audiostart, soundstart, speechstart,
speechend, soundend, audioend.

error, interface: SpeechInputErrorEvent
Fired when a speech recognition error occurs. The error attribute is
set to a SpeechInputError object.
After an error event is fired, no further events will be fired for the
given speech input request.




-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902
Received on Thursday, 30 June 2011 11:58:01 UTC