Re: SpeechSynthesisUtterance volume, rate, pitch and voice selection from Glen Shires on 2012-10-03 (public-speech-api@w3.org from October 2012)

From: Glen Shires <gshires@google.com>
Date: Wed, 3 Oct 2012 16:09:58 -0700
To: Dominic Mazzoni <dmazzoni@google.com>
Cc: public-speech-api@w3.org
Message-ID: <CAEE5bchigsrTUesaNVCsSq-Ab7aQHMajnbTODYCogY=3btbxrA@mail.gmail.com>
Yes, in the definitions of voiceURI, please substitute:
  s/serviceURI/voiceURI/

On Wed, Oct 3, 2012 at 3:53 PM, Dominic Mazzoni <dmazzoni@google.com> wrote:

> Looks good, thanks for including this.
>
> I think you included serviceURI where you meant voiceURI, but otherwise
> fine.
>
> - Dominic
>
>
> On Wed, Oct 3, 2012 at 3:47 PM, Glen Shires <gshires@google.com> wrote:
>
>> I propose the following additions to allow enumeration of available
>> voices and to select one, and also to allow selection of volume, rate and
>> pitch.  If there's no disagreement, I'll add these to the spec on Friday.
>>
>> interface SpeechSynthesisUtterance {
>>   attribute DOMString text;
>>   attribute DOMString lang;
>>   attribute DOMString voiceURI;
>>   attribute double volume;
>>   attribute double rate;
>>   attribute double pitch;
>> };
>>
>> text attribute:
>>   The text to be synthesized and spoken for this utterance. This may be
>> either plain text or a complete, well-formed SSML document. For
>> speech synthesis engines that do not support SSML, or only support certain
>> tags, the user agent or speech engine must strip away the tags they do
>> not support and speak the text. There may be a maximum length of the text
>> of 32,767 characters.
>>
>> lang attribute:
>> (no change to definition)
>>
>> // Note that serviceURI is renamed to voiceURI, with the same definition
>> except that it also specifies the voice...
>>
>> voiceURI attribute:
>>   The voiceURI attribute specifies the speech synthesis voice and
>> the location of the speech synthesis service that the web application
>> wishes to use. If this attribute is unset at the time of the play method
>> call, then the user agent must use the user agent default speech service.
>> Note that the serviceURI is a generic URI and can thus point to local
>> services either through use of a URN with meaning to the user agent or by
>> specifying a URL that the user agent recognizes as a local service.
>> Additionally, the user agent default can be local or remote and can
>> incorporate end user choices via interfaces provided by the user agent such
>> as browser configuration parameters.
>>
>> volume attribute
>>   Speaking volume between 0 and 1 inclusive, with 0 being lowest and 1
>> being highest, with a default of 1.0.  If SSML is used, this value will be
>> overridden by prosody tags in the markup.
>>
>> rate attribute
>>   Speaking rate relative to the default rate for this voice. 1.0 is the
>> default rate supported by the speech synthesis engine or specific
>> voice (which should correspond to a normal speaking rate). 2.0 is twice as
>> fast, and 0.5 is half as fast. Values below 0.1 or above 10.0 are strictly
>> disallowed, but speech synthesis engines or specific voices may constrain
>> the minimum and maximum rates further—for example a particular voice may
>> not actually speak faster than 3 times normal even if you specify a value
>> larger than 3.0. If SSML is used, this value will be overridden by prosody
>> tags in the markup.
>>
>> pitch attribute
>>   Speaking pitch between 0 and 2 inclusive, with 0 being lowest and 2
>> being highest. 1.0 corresponds to the default pitch of the speech synthesis
>> engine or specific voice.  Speech synthesis engines or voices may constrain
>> the minimum and maximum rates further. If SSML is used, this value will be
>> overridden by prosody tags in the markup.
>>
>>
>> interface SpeechSynthesisVoice {
>>   readonly attribute DOMString voiceURI;
>>   readonly attribute DOMString name;
>>   readonly attribute boolean localService;
>>   readonly attribute boolean default;
>> };
>>
>> voiceURI attribute:
>>   The voiceURI attribute specifies the speech synthesis voice and
>> the location of the speech synthesis service that the web application
>> wishes to use. If this attribute is unset at the time of the play method
>> call, then the user agent must use the user agent default speech service.
>> Note that the serviceURI is a generic URI and can thus point to local
>> services either through use of a URN with meaning to the user agent or
>> by specifying a URL that the user agent recognizes as a local service.
>>
>> name attribute:
>>   A human-readable name that represents the voice. There is no guarantee
>> that all names returned are unique.
>>
>> lang attribute:
>>   This attribute is a valid BCP 47 language tag indicating the language
>> of the voice.
>>
>> localService attribute:
>>   This attribute is true for voices supplied by a local speech
>> synthesizer, and is false for voices supplied by a remote speech
>> synthesizer service.  (This may be useful for the developer because remote
>> services may imply additional latency, bandwidth or cost, whereas local
>> voices may imply lower quality, however there is no guarantee that any of
>> these implications are true.)
>>
>> default attribute:
>>   This attribute is true for at most one voice per language. There may be
>> a different default for each language. It is user agent dependent how
>> default voices are determined.
>>
>>
>> interface SpeechSynthesisVoiceList {
>>   readonly attribute unsigned long length;
>>   getter SpeechSynthesisVoice item(in unsigned long index);
>> };
>>
>>
>> interface SpeechSynthesis {
>>   ...
>>   static SpeechSynthesisVoiceList getVoices();
>> };
>>
>> getVoices method
>>   The getVoices method returns the available voices.  It is user agent dependent
>> which voices are available.
>>
>> /Glen Shires
>>
>>
>
Received on Wednesday, 3 October 2012 23:11:06 UTC