Re: SpeechSynthesisUtterance volume, rate, pitch and voice selection from Dominic Mazzoni on 2012-10-03 (public-speech-api@w3.org from October 2012)

From: Dominic Mazzoni <dmazzoni@google.com>
Date: Wed, 3 Oct 2012 15:53:17 -0700
To: Glen Shires <gshires@google.com>
Cc: public-speech-api@w3.org
Message-ID: <CAFz-FYyzpo_8yygypO-Udd5o70qbSsTs3n1_1oUtpFXcSvx1wQ@mail.gmail.com>
Looks good, thanks for including this.

I think you included serviceURI where you meant voiceURI, but otherwise
fine.

- Dominic

On Wed, Oct 3, 2012 at 3:47 PM, Glen Shires <gshires@google.com> wrote:

> I propose the following additions to allow enumeration of available voices
> and to select one, and also to allow selection of volume, rate and pitch.
>  If there's no disagreement, I'll add these to the spec on Friday.
>
> interface SpeechSynthesisUtterance {
>   attribute DOMString text;
>   attribute DOMString lang;
>   attribute DOMString voiceURI;
>   attribute double volume;
>   attribute double rate;
>   attribute double pitch;
> };
>
> text attribute:
>   The text to be synthesized and spoken for this utterance. This may be
> either plain text or a complete, well-formed SSML document. For
> speech synthesis engines that do not support SSML, or only support certain
> tags, the user agent or speech engine must strip away the tags they do
> not support and speak the text. There may be a maximum length of the text
> of 32,767 characters.
>
> lang attribute:
> (no change to definition)
>
> // Note that serviceURI is renamed to voiceURI, with the same definition
> except that it also specifies the voice...
>
> voiceURI attribute:
>   The voiceURI attribute specifies the speech synthesis voice and
> the location of the speech synthesis service that the web application
> wishes to use. If this attribute is unset at the time of the play method
> call, then the user agent must use the user agent default speech service.
> Note that the serviceURI is a generic URI and can thus point to local
> services either through use of a URN with meaning to the user agent or by
> specifying a URL that the user agent recognizes as a local service.
> Additionally, the user agent default can be local or remote and can
> incorporate end user choices via interfaces provided by the user agent such
> as browser configuration parameters.
>
> volume attribute
>   Speaking volume between 0 and 1 inclusive, with 0 being lowest and 1
> being highest, with a default of 1.0.  If SSML is used, this value will be
> overridden by prosody tags in the markup.
>
> rate attribute
>   Speaking rate relative to the default rate for this voice. 1.0 is the
> default rate supported by the speech synthesis engine or specific
> voice (which should correspond to a normal speaking rate). 2.0 is twice as
> fast, and 0.5 is half as fast. Values below 0.1 or above 10.0 are strictly
> disallowed, but speech synthesis engines or specific voices may constrain
> the minimum and maximum rates further—for example a particular voice may
> not actually speak faster than 3 times normal even if you specify a value
> larger than 3.0. If SSML is used, this value will be overridden by prosody
> tags in the markup.
>
> pitch attribute
>   Speaking pitch between 0 and 2 inclusive, with 0 being lowest and 2
> being highest. 1.0 corresponds to the default pitch of the speech synthesis
> engine or specific voice.  Speech synthesis engines or voices may constrain
> the minimum and maximum rates further. If SSML is used, this value will be
> overridden by prosody tags in the markup.
>
>
> interface SpeechSynthesisVoice {
>   readonly attribute DOMString voiceURI;
>   readonly attribute DOMString name;
>   readonly attribute boolean localService;
>   readonly attribute boolean default;
> };
>
> voiceURI attribute:
>   The voiceURI attribute specifies the speech synthesis voice and
> the location of the speech synthesis service that the web application
> wishes to use. If this attribute is unset at the time of the play method
> call, then the user agent must use the user agent default speech service.
> Note that the serviceURI is a generic URI and can thus point to local
> services either through use of a URN with meaning to the user agent or by
> specifying a URL that the user agent recognizes as a local service.
>
> name attribute:
>   A human-readable name that represents the voice. There is no guarantee
> that all names returned are unique.
>
> lang attribute:
>   This attribute is a valid BCP 47 language tag indicating the language of
> the voice.
>
> localService attribute:
>   This attribute is true for voices supplied by a local speech
> synthesizer, and is false for voices supplied by a remote speech
> synthesizer service.  (This may be useful for the developer because remote
> services may imply additional latency, bandwidth or cost, whereas local
> voices may imply lower quality, however there is no guarantee that any of
> these implications are true.)
>
> default attribute:
>   This attribute is true for at most one voice per language. There may be
> a different default for each language. It is user agent dependent how
> default voices are determined.
>
>
> interface SpeechSynthesisVoiceList {
>   readonly attribute unsigned long length;
>   getter SpeechSynthesisVoice item(in unsigned long index);
> };
>
>
> interface SpeechSynthesis {
>   ...
>   static SpeechSynthesisVoiceList getVoices();
> };
>
> getVoices method
>   The getVoices method returns the available voices.  It is user agent dependent
> which voices are available.
>
> /Glen Shires
>
>
Received on Wednesday, 3 October 2012 22:53:45 UTC