W3C home > Mailing lists > Public > public-speech-api@w3.org > April 2012

Re: Speech API: first editor's draft posted

From: Glen Shires <gshires@google.com>
Date: Mon, 23 Apr 2012 16:25:24 -0700
Message-ID: <CAEE5bcis8GA5Qa8n+hWJnn8dOpD+=mju4kg2Ago+29Fnsk00+g@mail.gmail.com>
To: "Young, Milan" <Milan.Young@nuance.com>
Cc: Hans Wennborg <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>, Satish S <satish@google.com>
We're still thinking about the proposed refactoring of
the SpeechRecognition result. I'll get back to you soon.

On Mon, Apr 23, 2012 at 3:35 PM, Young, Milan <Milan.Young@nuance.com>wrote:

>  We’ve heard from Google and Nuance so far.  Does anybody else have an
> opinion on the following parameters?  Would explicitly including them in
> the API (as opposed to pushing them to a custom-parameter bin) generally
> speed or slow adoption?****
>
>        float confidenceThreshold;
>        integer completeTimeout;
>        integer incompleteTimeout;
>        integer maxSpeechTimeout;****
>
> ** **
>
> Glen, glad to know we agree on the most important points.  Did you also
> want to comment on my proposal for refactoring the SpeechRecogniton result,
> or should I take silence to mean agreement?****
>
> ** **
>
> Thanks****
>
> ** **
>
> ** **
>
> *From:* Glen Shires [mailto:gshires@google.com]
> *Sent:* Monday, April 23, 2012 12:34 PM
> *To:* Young, Milan
>
> *Cc:* Hans Wennborg; public-speech-api@w3.org; Satish S
> *Subject:* Re: Speech API: first editor's draft posted****
>
> ** **
>
> Milan,****
>
> We're not planning on having regular conference calls, so I'd like to ask
> the group to discuss this and other issues via email.****
>
> ** **
>
> Also, I apologize for not responding earlier, so I'll respond now...****
>
> ** **
>
> ** **
>
> - SpeechParameterList parameters;****
>
> - void setCustomParameter(in DOMString name, in DOMString value);****
>
> ** **
>
> I agree we need an attribute and method such as these for both
> SpeechRecognition and TTS.****
>
> ** **
>
> I also suggest another method...****
>
> ** **
>
> - enumerateCustomParameter(DomString name)****
>
>    - It returns a list of valid DOMString values****
>
>      for setCustomParameter(name, value)****
>
> ** **
>
>    - We may also want to specify some way that a****
>
>      numeric range could be returned****
>
> ** **
>
>    - (I'm open to suggestions for a better name,****
>
>       I don't particularly like "enumerate")****
>
> ** **
>
>     - I presume that SpeechParameterList contains****
>
>       all the valid names, so we don't need a method****
>
>       to enumerate them.****
>
> ** **
>
> One example where enumerateCustomParameter would be very useful would be
> to select a TTS voice. For example, the following could return a list of
> DOMStrings of all available voices:****
>
> ** **
>
>    TTS.enumerateCustomParameter("voice")****
>
> ** **
>
> ** **
>
> - maxNBest for SpeechRecognition****
>
>    I agree.****
>
> ** **
>
> - serviceUri for SpeechRecognition and TTS****
>
>    I agree.****
>
> ** **
>
> ** **
>
> For this initial specification, we believe that a simplified API will
> accelerate implementation, interoperability testing, standardization and
> ultimately developer adoption.  For this reason, we believe that timeout
> parameters and confidenceThreshold should not be added to this initial spec
> because:****
>
> ** **
>
> - They are not required for the majority of use cases.****
>
> ** **
>
> - They can be confusing for web developers, particularly those with little
> speech experience. Often it's best for developers to rely on the default
> values set by the speech recognition service.****
>
> ** **
>
> - Their definition and implementation may vary between different speech
> service implementations.****
>
> ** **
>
> - Confidence is returned in the recognition results, so sophisticated
> developers can compare and process relative confidence levels, which is
> often more useful than a threshold, particularly because confidence value
> definitions vary by speech services.****
>
> ** **
>
> - setCustomParameter can be used to set these parameters.****
>
> ** **
>
> /Glen Shires****
>
> ** **
>
> On Mon, Apr 23, 2012 at 10:56 AM, Young, Milan <Milan.Young@nuance.com>
> wrote:****
>
> Being new to Community Groups, I'm not clear on the plan for resolving
> issues like that which I have posted below.  Do I need to submit a concrete
> counter-proposal?    Should we have regular conference calls to discuss
> this and the other issues that have come up?  Perhaps a F2F to get us
> started?
>
> Thanks****
>
>
> -----Original Message-----
> From: Young, Milan [mailto:Milan.Young@nuance.com]
> Sent: Friday, April 13, 2012 2:06 PM
> To: Hans Wennborg; public-speech-api@w3.org
> Cc: Satish S; Glen Shires****
>
> Subject: RE: Speech API: first editor's draft posted
>
> Thank you for the draft, this looks like an excellent start.  A few
> comments/suggestions on the following:
>
> SpeechRecognition
>  - In addition to the three parameters you have listed, I see the
> following as necessary:
>        integer maxNBest;
>        float confidenceThreshold;
>        integer completeTimeout;
>        integer incompleteTimeout;
>        integer maxSpeechTimeout;
>        attribute DOMString serviceURI;
>
> - We'll also need an interface for setting non-standard parameters.  This
> will be critical to avoid rat-holing into a complete list of parameters.
>        SpeechParameterList parameters;
>        void setCustomParameter(in DOMString name, in DOMString value);
>
>    interface SpeechParameter {
>        attribute DOMString name;
>        attribute DOMString value;
>    };
>
>    interface SpeechParameterList {
>        readonly attribute unsigned long length;
>        getter SpeechParameter item(in unsigned long index);
>    };
>
> - I prefer a flatter structure for SpeechRecogntion.  Part of doing that
> would involve splitting the error path out to its own event.  I suggest the
> following:
>
>    // A full response, which could be interim or final, part of a
> continuous response or not
>    interface SpeechRecognitionResult : RecognitionEvent {
>        readonly attribute unsigned long length;
>        getter SpeechRecognitionAlternative item(in unsigned long index);
>        readonly attribute boolean final;
>        readonly attribute short resultIndex;
>        readonly attribute SpeechRecognitionResultList resultHistory;
>    };
>
>    interface SpeechRecognitionError : RecognitionEvent {
>      // As before
>    };
>
>
>
> TTS
>  - At a minimum, we'll need the same serviceURI parameter and generic
> parameter interface as in SpeechRecognition.
>  - I'd also like to hear some discussion on the importance of "marking"
> the stream.  I personally feel this is common enough that I should be part
> of a v1.
>
>
> Thanks
>
>
> -----Original Message-----
> From: Hans Wennborg [mailto:hwennborg@google.com]
> Sent: Thursday, April 12, 2012 7:36 AM
> To: public-speech-api@w3.org
> Cc: Satish S; Glen Shires
> Subject: Speech API: first editor's draft posted
>
> In December, Google proposed [1] to public-webapps a Speech JavaScript API
> that subset supports the majority of the use-cases in the Speech Incubator
> Group's Final Report. This proposal provides a programmatic API that
> enables web-pages to synthesize speech output and to use speech recognition
> as an input for forms, continuous dictation and control.
>
> We have now posted in the Speech-API Community Group's repository, a
> slightly updated proposal [2], the differences include:
>
>  - Document is now self-contained, rather than having multiple references
> to the XG Final Report.
>  - Renamed SpeechReco interface to SpeechRecognition
>  - Renamed interfaces and attributes beginning SpeechInput* to
> SpeechRecognition*
>  - Moved EventTarget to constructor of SpeechRecognition
>  - Clarified that grammars and lang are attributes of SpeechRecognition
>  - Clarified that if index is greater than or equal to length, returns null
>
> We welcome discussion and feedback on this editor's draft. Please send
> your comments to the public-speech-api@w3.org mailing list.
>
> Glen Shires
> Hans Wennborg
>
> [1]
> http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/1696.html
> [2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
>
> ****
>
>
>
> ****
>
> ** **
>
> --
> Thanks!****
>
> Glen Shires****
>
> ** **
>



-- 
Thanks!
Glen Shires
Received on Monday, 23 April 2012 23:26:34 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:27:22 UTC