Re: Speech API: first editor's draft posted from Glen Shires on 2012-04-23 (public-speech-api@w3.org from April 2012)

From: Glen Shires <gshires@google.com>
Date: Mon, 23 Apr 2012 12:34:06 -0700
To: "Young, Milan" <Milan.Young@nuance.com>
Cc: Hans Wennborg <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>, Satish S <satish@google.com>
Message-ID: <CAEE5bchJ82w9w3puV-jbdF2US+68+mH2Qbto9rv1Vat2RAVjEA@mail.gmail.com>
Milan,
We're not planning on having regular conference calls, so I'd like to ask
the group to discuss this and other issues via email.

Also, I apologize for not responding earlier, so I'll respond now...


- SpeechParameterList parameters;
- void setCustomParameter(in DOMString name, in DOMString value);

I agree we need an attribute and method such as these for both
SpeechRecognition and TTS.

I also suggest another method...

- enumerateCustomParameter(DomString name)
   - It returns a list of valid DOMString values
     for setCustomParameter(name, value)

   - We may also want to specify some way that a
     numeric range could be returned

   - (I'm open to suggestions for a better name,
      I don't particularly like "enumerate")

    - I presume that SpeechParameterList contains
      all the valid names, so we don't need a method
      to enumerate them.

One example where enumerateCustomParameter would be very useful would be to
select a TTS voice. For example, the following could return a list of
DOMStrings of all available voices:

   TTS.enumerateCustomParameter("voice")


- maxNBest for SpeechRecognition
   I agree.

- serviceUri for SpeechRecognition and TTS
   I agree.


For this initial specification, we believe that a simplified API will
accelerate implementation, interoperability testing, standardization and
ultimately developer adoption.  For this reason, we believe that timeout
parameters and confidenceThreshold should not be added to this initial spec
because:

- They are not required for the majority of use cases.

- They can be confusing for web developers, particularly those with little
speech experience. Often it's best for developers to rely on the default
values set by the speech recognition service.

- Their definition and implementation may vary between different speech
service implementations.

- Confidence is returned in the recognition results, so sophisticated
developers can compare and process relative confidence levels, which is
often more useful than a threshold, particularly because confidence value
definitions vary by speech services.

- setCustomParameter can be used to set these parameters.

/Glen Shires

On Mon, Apr 23, 2012 at 10:56 AM, Young, Milan <Milan.Young@nuance.com>wrote:

> Being new to Community Groups, I'm not clear on the plan for resolving
> issues like that which I have posted below.  Do I need to submit a concrete
> counter-proposal?    Should we have regular conference calls to discuss
> this and the other issues that have come up?  Perhaps a F2F to get us
> started?
>
> Thanks
>
> -----Original Message-----
> From: Young, Milan [mailto:Milan.Young@nuance.com]
> Sent: Friday, April 13, 2012 2:06 PM
> To: Hans Wennborg; public-speech-api@w3.org
> Cc: Satish S; Glen Shires
> Subject: RE: Speech API: first editor's draft posted
>
> Thank you for the draft, this looks like an excellent start.  A few
> comments/suggestions on the following:
>
> SpeechRecognition
>  - In addition to the three parameters you have listed, I see the
> following as necessary:
>        integer maxNBest;
>        float confidenceThreshold;
>        integer completeTimeout;
>        integer incompleteTimeout;
>        integer maxSpeechTimeout;
>        attribute DOMString serviceURI;
>
> - We'll also need an interface for setting non-standard parameters.  This
> will be critical to avoid rat-holing into a complete list of parameters.
>        SpeechParameterList parameters;
>        void setCustomParameter(in DOMString name, in DOMString value);
>
>    interface SpeechParameter {
>        attribute DOMString name;
>        attribute DOMString value;
>    };
>
>    interface SpeechParameterList {
>        readonly attribute unsigned long length;
>        getter SpeechParameter item(in unsigned long index);
>    };
>
> - I prefer a flatter structure for SpeechRecogntion.  Part of doing that
> would involve splitting the error path out to its own event.  I suggest the
> following:
>
>    // A full response, which could be interim or final, part of a
> continuous response or not
>    interface SpeechRecognitionResult : RecognitionEvent {
>        readonly attribute unsigned long length;
>        getter SpeechRecognitionAlternative item(in unsigned long index);
>        readonly attribute boolean final;
>        readonly attribute short resultIndex;
>        readonly attribute SpeechRecognitionResultList resultHistory;
>    };
>
>    interface SpeechRecognitionError : RecognitionEvent {
>      // As before
>    };
>
>
>
> TTS
>  - At a minimum, we'll need the same serviceURI parameter and generic
> parameter interface as in SpeechRecognition.
>  - I'd also like to hear some discussion on the importance of "marking"
> the stream.  I personally feel this is common enough that I should be part
> of a v1.
>
>
> Thanks
>
>
> -----Original Message-----
> From: Hans Wennborg [mailto:hwennborg@google.com]
> Sent: Thursday, April 12, 2012 7:36 AM
> To: public-speech-api@w3.org
> Cc: Satish S; Glen Shires
> Subject: Speech API: first editor's draft posted
>
> In December, Google proposed [1] to public-webapps a Speech JavaScript API
> that subset supports the majority of the use-cases in the Speech Incubator
> Group's Final Report. This proposal provides a programmatic API that
> enables web-pages to synthesize speech output and to use speech recognition
> as an input for forms, continuous dictation and control.
>
> We have now posted in the Speech-API Community Group's repository, a
> slightly updated proposal [2], the differences include:
>
>  - Document is now self-contained, rather than having multiple references
> to the XG Final Report.
>  - Renamed SpeechReco interface to SpeechRecognition
>  - Renamed interfaces and attributes beginning SpeechInput* to
> SpeechRecognition*
>  - Moved EventTarget to constructor of SpeechRecognition
>  - Clarified that grammars and lang are attributes of SpeechRecognition
>  - Clarified that if index is greater than or equal to length, returns null
>
> We welcome discussion and feedback on this editor's draft. Please send
> your comments to the public-speech-api@w3.org mailing list.
>
> Glen Shires
> Hans Wennborg
>
> [1]
> http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/1696.html
> [2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
>
>
>


-- 
Thanks!
Glen Shires
Received on Monday, 23 April 2012 19:35:17 UTC