RE: Speech API: first editor's draft posted from Young, Milan on 2012-04-13 (public-speech-api@w3.org from April 2012)

From: Young, Milan <Milan.Young@nuance.com>
Date: Fri, 13 Apr 2012 21:05:35 +0000
To: Hans Wennborg <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
CC: Satish S <satish@google.com>, Glen Shires <gshires@google.com>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C1A4497CB@SOM-EXCH04.nuance.com>

Thank you for the draft, this looks like an excellent start.  A few comments/suggestions on the following:

SpeechRecognition
 - In addition to the three parameters you have listed, I see the following as necessary: 
        integer maxNBest;
        float confidenceThreshold;
        integer completeTimeout;
        integer incompleteTimeout;
        integer maxSpeechTimeout;
        attribute DOMString serviceURI;

- We'll also need an interface for setting non-standard parameters.  This will be critical to avoid rat-holing into a complete list of parameters.
        SpeechParameterList parameters;
        void setCustomParameter(in DOMString name, in DOMString value);

    interface SpeechParameter {
        attribute DOMString name;
        attribute DOMString value;
    };

    interface SpeechParameterList {
        readonly attribute unsigned long length;
        getter SpeechParameter item(in unsigned long index);
    };

- I prefer a flatter structure for SpeechRecogntion.  Part of doing that would involve splitting the error path out to its own event.  I suggest the following: 

    // A full response, which could be interim or final, part of a continuous response or not
    interface SpeechRecognitionResult : RecognitionEvent {
        readonly attribute unsigned long length;
        getter SpeechRecognitionAlternative item(in unsigned long index);
        readonly attribute boolean final;
        readonly attribute short resultIndex;
        readonly attribute SpeechRecognitionResultList resultHistory;
    };

    interface SpeechRecognitionError : RecognitionEvent {
      // As before
    };



TTS
  - At a minimum, we'll need the same serviceURI parameter and generic parameter interface as in SpeechRecognition.  
  - I'd also like to hear some discussion on the importance of "marking" the stream.  I personally feel this is common enough that I should be part of a v1.
 

Thanks


-----Original Message-----
From: Hans Wennborg [mailto:hwennborg@google.com] 
Sent: Thursday, April 12, 2012 7:36 AM
To: public-speech-api@w3.org
Cc: Satish S; Glen Shires
Subject: Speech API: first editor's draft posted

In December, Google proposed [1] to public-webapps a Speech JavaScript API that subset supports the majority of the use-cases in the Speech Incubator Group's Final Report. This proposal provides a programmatic API that enables web-pages to synthesize speech output and to use speech recognition as an input for forms, continuous dictation and control.

We have now posted in the Speech-API Community Group's repository, a slightly updated proposal [2], the differences include:

 - Document is now self-contained, rather than having multiple references to the XG Final Report.
 - Renamed SpeechReco interface to SpeechRecognition
 - Renamed interfaces and attributes beginning SpeechInput* to
SpeechRecognition*
 - Moved EventTarget to constructor of SpeechRecognition
 - Clarified that grammars and lang are attributes of SpeechRecognition
 - Clarified that if index is greater than or equal to length, returns null

We welcome discussion and feedback on this editor's draft. Please send your comments to the public-speech-api@w3.org mailing list.

Glen Shires
Hans Wennborg

[1] http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/1696.html
[2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html

Received on Friday, 13 April 2012 21:06:04 UTC