W3C home > Mailing lists > Public > public-speech-api@w3.org > April 2012

Re: Speech API: first editor's draft posted

From: Glen Shires <gshires@google.com>
Date: Mon, 23 Apr 2012 12:34:06 -0700
Message-ID: <CAEE5bchJ82w9w3puV-jbdF2US+68+mH2Qbto9rv1Vat2RAVjEA@mail.gmail.com>
To: "Young, Milan" <Milan.Young@nuance.com>
Cc: Hans Wennborg <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>, Satish S <satish@google.com>
We're not planning on having regular conference calls, so I'd like to ask
the group to discuss this and other issues via email.

Also, I apologize for not responding earlier, so I'll respond now...

- SpeechParameterList parameters;
- void setCustomParameter(in DOMString name, in DOMString value);

I agree we need an attribute and method such as these for both
SpeechRecognition and TTS.

I also suggest another method...

- enumerateCustomParameter(DomString name)
   - It returns a list of valid DOMString values
     for setCustomParameter(name, value)

   - We may also want to specify some way that a
     numeric range could be returned

   - (I'm open to suggestions for a better name,
      I don't particularly like "enumerate")

    - I presume that SpeechParameterList contains
      all the valid names, so we don't need a method
      to enumerate them.

One example where enumerateCustomParameter would be very useful would be to
select a TTS voice. For example, the following could return a list of
DOMStrings of all available voices:


- maxNBest for SpeechRecognition
   I agree.

- serviceUri for SpeechRecognition and TTS
   I agree.

For this initial specification, we believe that a simplified API will
accelerate implementation, interoperability testing, standardization and
ultimately developer adoption.  For this reason, we believe that timeout
parameters and confidenceThreshold should not be added to this initial spec

- They are not required for the majority of use cases.

- They can be confusing for web developers, particularly those with little
speech experience. Often it's best for developers to rely on the default
values set by the speech recognition service.

- Their definition and implementation may vary between different speech
service implementations.

- Confidence is returned in the recognition results, so sophisticated
developers can compare and process relative confidence levels, which is
often more useful than a threshold, particularly because confidence value
definitions vary by speech services.

- setCustomParameter can be used to set these parameters.

/Glen Shires

On Mon, Apr 23, 2012 at 10:56 AM, Young, Milan <Milan.Young@nuance.com>wrote:

> Being new to Community Groups, I'm not clear on the plan for resolving
> issues like that which I have posted below.  Do I need to submit a concrete
> counter-proposal?    Should we have regular conference calls to discuss
> this and the other issues that have come up?  Perhaps a F2F to get us
> started?
> Thanks
> -----Original Message-----
> From: Young, Milan [mailto:Milan.Young@nuance.com]
> Sent: Friday, April 13, 2012 2:06 PM
> To: Hans Wennborg; public-speech-api@w3.org
> Cc: Satish S; Glen Shires
> Subject: RE: Speech API: first editor's draft posted
> Thank you for the draft, this looks like an excellent start.  A few
> comments/suggestions on the following:
> SpeechRecognition
>  - In addition to the three parameters you have listed, I see the
> following as necessary:
>        integer maxNBest;
>        float confidenceThreshold;
>        integer completeTimeout;
>        integer incompleteTimeout;
>        integer maxSpeechTimeout;
>        attribute DOMString serviceURI;
> - We'll also need an interface for setting non-standard parameters.  This
> will be critical to avoid rat-holing into a complete list of parameters.
>        SpeechParameterList parameters;
>        void setCustomParameter(in DOMString name, in DOMString value);
>    interface SpeechParameter {
>        attribute DOMString name;
>        attribute DOMString value;
>    };
>    interface SpeechParameterList {
>        readonly attribute unsigned long length;
>        getter SpeechParameter item(in unsigned long index);
>    };
> - I prefer a flatter structure for SpeechRecogntion.  Part of doing that
> would involve splitting the error path out to its own event.  I suggest the
> following:
>    // A full response, which could be interim or final, part of a
> continuous response or not
>    interface SpeechRecognitionResult : RecognitionEvent {
>        readonly attribute unsigned long length;
>        getter SpeechRecognitionAlternative item(in unsigned long index);
>        readonly attribute boolean final;
>        readonly attribute short resultIndex;
>        readonly attribute SpeechRecognitionResultList resultHistory;
>    };
>    interface SpeechRecognitionError : RecognitionEvent {
>      // As before
>    };
>  - At a minimum, we'll need the same serviceURI parameter and generic
> parameter interface as in SpeechRecognition.
>  - I'd also like to hear some discussion on the importance of "marking"
> the stream.  I personally feel this is common enough that I should be part
> of a v1.
> Thanks
> -----Original Message-----
> From: Hans Wennborg [mailto:hwennborg@google.com]
> Sent: Thursday, April 12, 2012 7:36 AM
> To: public-speech-api@w3.org
> Cc: Satish S; Glen Shires
> Subject: Speech API: first editor's draft posted
> In December, Google proposed [1] to public-webapps a Speech JavaScript API
> that subset supports the majority of the use-cases in the Speech Incubator
> Group's Final Report. This proposal provides a programmatic API that
> enables web-pages to synthesize speech output and to use speech recognition
> as an input for forms, continuous dictation and control.
> We have now posted in the Speech-API Community Group's repository, a
> slightly updated proposal [2], the differences include:
>  - Document is now self-contained, rather than having multiple references
> to the XG Final Report.
>  - Renamed SpeechReco interface to SpeechRecognition
>  - Renamed interfaces and attributes beginning SpeechInput* to
> SpeechRecognition*
>  - Moved EventTarget to constructor of SpeechRecognition
>  - Clarified that grammars and lang are attributes of SpeechRecognition
>  - Clarified that if index is greater than or equal to length, returns null
> We welcome discussion and feedback on this editor's draft. Please send
> your comments to the public-speech-api@w3.org mailing list.
> Glen Shires
> Hans Wennborg
> [1]
> http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/1696.html
> [2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html

Glen Shires
Received on Monday, 23 April 2012 19:35:17 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:27:22 UTC