IDL and examples for new SpeechInputRequest from Satish S on 2011-09-20 (public-xg-htmlspeech@w3.org from September 2011)

From: Satish S <satish@google.com>
Date: Wed, 21 Sep 2011 00:15:30 +0100
To: public-xg-htmlspeech@w3.org
Message-ID: <CAHZf7R=aj7jHByWUbPMgq9RDp-0edUSLm_AL1vP+CLP9ZfWbsw@mail.gmail.com>
As discussed in last week's call, I have drafted an IDL for
SpeechInputRequest and some examples. Please review them below.

Some key differences from what was discussed:

   1. Since we wanted to have .start() automatically call .init() if not
   done already, the .init() call needs to not take any parameters so it can be
   invoked behind the scenes.
   2. We talked about similarities with XHR so I looked up the latest XHR2
   draft. There is a preference to do away with a single onreadystatechange
   style callback and split it into separate events. This also matches the
   above requirement of a .init() call with no parameters, so all event
   handlers are attributes in the IDL.
   3. I added a 'continuous' boolean attribute as it seemed missing in the
   draft doc and there wasn't any way specified to request one-shot or
   continuous recognition.
   4. I added a 'filterOffensiveWords' boolean attribute as it came across
   as a necessary feature in real world applications (when we tested voice
   search on the Google homepage).

Some questions:

   1. The 'saveWaveformURI' and 'inputWaveformURI' attributes are a bit
   troubling. This will require us probably specify codecs to support, whether
   the UA should transcode in case the input waveform doesn't match what the
   speech service accepts, same origin policies and so on. Given the few weeks
   we have remaining, is this a strong use case for us to look into or can we
   remove it?
   2. The 'saveForRereco' usage and API is unclear at the moment. Has anyone
   given thought more about it?

IDL:

[Constructor]
interface SpeechInputService : EventTarget {
  // attributes related to connection with speech service
  attribute DOMString uri;
  attribute DOMString saveWaveformURI;  // usage? codecs and same origin
policies?
  attribute DOMString inputWaveformURI;  // again, codecs? should UA
reencode?
  attribute MediaStream input;

  // attributes related to speech reco
  attribute DOMString[] languages;
  attribute DOMString[] grammars;
  attribute DOMStringMap customParameters;
  attribute int maxNBest;
  attribute boolean continuous;  // was missing earlier?
  attribute boolean filterOffensiveWords;  // added new, useful in real
world context
  attribute boolean saveForRereco;  // usage?
  attribute boolean localEndpointer;  // renamed from the earlier
'setendpointdetection' method
  attribute boolean finalizeBeforeEnd;
  attribute boolean interimResults;
  attribute int interimResultsFreq;
  attribute float confidenceThreshold;
  attribute float sensitivity;
  attribute float speedVersusAccuracy;
  attribute int completeTimeout;
  attribute int incompleteTimeout;
  attribute int maxSpeechTimeout;

  // methods
  void open();
  void start();
  void stop();
  void abort();

  // event handler IDL attributes
  attribute Function onopen;
  attribute Function onstart;
  attribute Function onend;
  attribute Function onresult;
  attribute Function onnomatch;
  attribute Function onerror;
  attribute Function onaudiostart;
  attribute Function onsoundstart;
  attribute Function onspeechstart;
  attribute Function onspeechend;
  attribute Function onsoundend;
  attribute Function onaudioend;
}

And a couple of examples, adapted from Robert's examples earlier:

Example 1:

  function simplestCase() {
    // Just give me the default recognizer.
    var req = new SpeechInputRequest();
    req.onresult = function(event) {
      // Do things with event.result
    };
    req.start();
  }

Example 2:

  function aComplexWebapp() {
    // Give me a recognizer for Australian or British English,
    // with grammars for dictation and datetime.
    // It should preferably model a child's vocal tract, but doesn't need
to.
    var req = new SpeechInputRequest();
    req.languages = ['en-AU', 'en-GB'];
    req.grammars = ['<builtin:dictation>', '<builtin:datetime>'];
    req.customParameters['age'] = 'child';
    req.continuous = true;  // And I'm gonna listen forever...
    req.interimResults = true;
    req.interimResultsFreq = 1000;
    req.onresult = function(event) {
      // Do things with event.result
    };
    req.onstart = function(event) {
      $('#status').text("I'm listening.");
      // Stop listening after a minute.
      window.setTimeout(function() {
        $('#status').text('Thank you, please try again.');
        req.abort();
        req = null;
      }, 60000);
    };
    req.onopen = function(event) {
      req.start();
    };
    req.onerror = function(event) {
      $('#status').text('Sorry, no dice.');
    };
    $('#status').text('Connecting and loading giant grammars...');
    req.open();
  }


Cheers
Satish
Received on Tuesday, 20 September 2011 23:16:04 UTC