W3C home > Mailing lists > Public > public-speech-api@w3.org > April 2012

RE: Speech API: first editor's draft posted

From: Young, Milan <Milan.Young@nuance.com>
Date: Mon, 23 Apr 2012 22:35:32 +0000
To: Glen Shires <gshires@google.com>
CC: Hans Wennborg <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>, Satish S <satish@google.com>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C1A4561F2@SOM-EXCH04.nuance.com>
We've heard from Google and Nuance so far.  Does anybody else have an opinion on the following parameters?  Would explicitly including them in the API (as opposed to pushing them to a custom-parameter bin) generally speed or slow adoption?
       float confidenceThreshold;
       integer completeTimeout;
       integer incompleteTimeout;
       integer maxSpeechTimeout;

Glen, glad to know we agree on the most important points.  Did you also want to comment on my proposal for refactoring the SpeechRecogniton result, or should I take silence to mean agreement?


From: Glen Shires [mailto:gshires@google.com]
Sent: Monday, April 23, 2012 12:34 PM
To: Young, Milan
Cc: Hans Wennborg; public-speech-api@w3.org; Satish S
Subject: Re: Speech API: first editor's draft posted

We're not planning on having regular conference calls, so I'd like to ask the group to discuss this and other issues via email.

Also, I apologize for not responding earlier, so I'll respond now...

- SpeechParameterList parameters;
- void setCustomParameter(in DOMString name, in DOMString value);

I agree we need an attribute and method such as these for both SpeechRecognition and TTS.

I also suggest another method...

- enumerateCustomParameter(DomString name)
   - It returns a list of valid DOMString values
     for setCustomParameter(name, value)

   - We may also want to specify some way that a
     numeric range could be returned

   - (I'm open to suggestions for a better name,
      I don't particularly like "enumerate")

    - I presume that SpeechParameterList contains
      all the valid names, so we don't need a method
      to enumerate them.

One example where enumerateCustomParameter would be very useful would be to select a TTS voice. For example, the following could return a list of DOMStrings of all available voices:


- maxNBest for SpeechRecognition
   I agree.

- serviceUri for SpeechRecognition and TTS
   I agree.

For this initial specification, we believe that a simplified API will accelerate implementation, interoperability testing, standardization and ultimately developer adoption.  For this reason, we believe that timeout parameters and confidenceThreshold should not be added to this initial spec because:

- They are not required for the majority of use cases.

- They can be confusing for web developers, particularly those with little speech experience. Often it's best for developers to rely on the default values set by the speech recognition service.

- Their definition and implementation may vary between different speech service implementations.

- Confidence is returned in the recognition results, so sophisticated developers can compare and process relative confidence levels, which is often more useful than a threshold, particularly because confidence value definitions vary by speech services.

- setCustomParameter can be used to set these parameters.

/Glen Shires

On Mon, Apr 23, 2012 at 10:56 AM, Young, Milan <Milan.Young@nuance.com<mailto:Milan.Young@nuance.com>> wrote:
Being new to Community Groups, I'm not clear on the plan for resolving issues like that which I have posted below.  Do I need to submit a concrete counter-proposal?    Should we have regular conference calls to discuss this and the other issues that have come up?  Perhaps a F2F to get us started?


-----Original Message-----
From: Young, Milan [mailto:Milan.Young@nuance.com<mailto:Milan.Young@nuance.com>]
Sent: Friday, April 13, 2012 2:06 PM
To: Hans Wennborg; public-speech-api@w3.org<mailto:public-speech-api@w3.org>
Cc: Satish S; Glen Shires
Subject: RE: Speech API: first editor's draft posted

Thank you for the draft, this looks like an excellent start.  A few comments/suggestions on the following:

 - In addition to the three parameters you have listed, I see the following as necessary:
       integer maxNBest;
       float confidenceThreshold;
       integer completeTimeout;
       integer incompleteTimeout;
       integer maxSpeechTimeout;
       attribute DOMString serviceURI;

- We'll also need an interface for setting non-standard parameters.  This will be critical to avoid rat-holing into a complete list of parameters.
       SpeechParameterList parameters;
       void setCustomParameter(in DOMString name, in DOMString value);

   interface SpeechParameter {
       attribute DOMString name;
       attribute DOMString value;

   interface SpeechParameterList {
       readonly attribute unsigned long length;
       getter SpeechParameter item(in unsigned long index);

- I prefer a flatter structure for SpeechRecogntion.  Part of doing that would involve splitting the error path out to its own event.  I suggest the following:

   // A full response, which could be interim or final, part of a continuous response or not
   interface SpeechRecognitionResult : RecognitionEvent {
       readonly attribute unsigned long length;
       getter SpeechRecognitionAlternative item(in unsigned long index);
       readonly attribute boolean final;
       readonly attribute short resultIndex;
       readonly attribute SpeechRecognitionResultList resultHistory;

   interface SpeechRecognitionError : RecognitionEvent {
     // As before

 - At a minimum, we'll need the same serviceURI parameter and generic parameter interface as in SpeechRecognition.
 - I'd also like to hear some discussion on the importance of "marking" the stream.  I personally feel this is common enough that I should be part of a v1.


-----Original Message-----
From: Hans Wennborg [mailto:hwennborg@google.com<mailto:hwennborg@google.com>]
Sent: Thursday, April 12, 2012 7:36 AM
To: public-speech-api@w3.org<mailto:public-speech-api@w3.org>
Cc: Satish S; Glen Shires
Subject: Speech API: first editor's draft posted

In December, Google proposed [1] to public-webapps a Speech JavaScript API that subset supports the majority of the use-cases in the Speech Incubator Group's Final Report. This proposal provides a programmatic API that enables web-pages to synthesize speech output and to use speech recognition as an input for forms, continuous dictation and control.

We have now posted in the Speech-API Community Group's repository, a slightly updated proposal [2], the differences include:

 - Document is now self-contained, rather than having multiple references to the XG Final Report.
 - Renamed SpeechReco interface to SpeechRecognition
 - Renamed interfaces and attributes beginning SpeechInput* to
 - Moved EventTarget to constructor of SpeechRecognition
 - Clarified that grammars and lang are attributes of SpeechRecognition
 - Clarified that if index is greater than or equal to length, returns null

We welcome discussion and feedback on this editor's draft. Please send your comments to the public-speech-api@w3.org<mailto:public-speech-api@w3.org> mailing list.

Glen Shires
Hans Wennborg

[1] http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/1696.html
[2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html

Glen Shires
Received on Monday, 23 April 2012 22:36:25 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:27:22 UTC