Re: SpeechSynthesisUtterance volume, rate, pitch and voice selection from Glen Shires on 2012-10-05 (public-speech-api@w3.org from October 2012)

From: Glen Shires <gshires@google.com>
Date: Fri, 5 Oct 2012 11:12:48 -0700
To: Dominic Mazzoni <dmazzoni@google.com>
Cc: public-speech-api@w3.org
Message-ID: <CAEE5bcj=YVJTBXr+6ZkwFLQ5JUvEAK9wxnrGsrpoY+h9V+pSnQ@mail.gmail.com>
I've updated the spec with the these changes to add volume, rate, pitch
attributes and getVoices method which returns
SpeechSynthesisVoiceList, SpeechSynthesisVoice.
https://dvcs.w3.org/hg/speech-api/rev/fdc26488164f

As always, the current draft spec is at:
http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html

/Glen Shires

On Wed, Oct 3, 2012 at 4:09 PM, Glen Shires <gshires@google.com> wrote:

> Yes, in the definitions of voiceURI, please substitute:
>   s/serviceURI/voiceURI/
>
>
> On Wed, Oct 3, 2012 at 3:53 PM, Dominic Mazzoni <dmazzoni@google.com>wrote:
>
>> Looks good, thanks for including this.
>>
>> I think you included serviceURI where you meant voiceURI, but otherwise
>> fine.
>>
>> - Dominic
>>
>>
>> On Wed, Oct 3, 2012 at 3:47 PM, Glen Shires <gshires@google.com> wrote:
>>
>>> I propose the following additions to allow enumeration of available
>>> voices and to select one, and also to allow selection of volume, rate and
>>> pitch.  If there's no disagreement, I'll add these to the spec on Friday.
>>>
>>> interface SpeechSynthesisUtterance {
>>>   attribute DOMString text;
>>>   attribute DOMString lang;
>>>   attribute DOMString voiceURI;
>>>   attribute double volume;
>>>   attribute double rate;
>>>   attribute double pitch;
>>> };
>>>
>>> text attribute:
>>>   The text to be synthesized and spoken for this utterance. This may be
>>> either plain text or a complete, well-formed SSML document. For
>>> speech synthesis engines that do not support SSML, or only support certain
>>> tags, the user agent or speech engine must strip away the tags they do
>>> not support and speak the text. There may be a maximum length of the text
>>> of 32,767 characters.
>>>
>>> lang attribute:
>>> (no change to definition)
>>>
>>> // Note that serviceURI is renamed to voiceURI, with the same definition
>>> except that it also specifies the voice...
>>>
>>> voiceURI attribute:
>>>   The voiceURI attribute specifies the speech synthesis voice and
>>> the location of the speech synthesis service that the web application
>>> wishes to use. If this attribute is unset at the time of the play method
>>> call, then the user agent must use the user agent default speech service.
>>> Note that the serviceURI is a generic URI and can thus point to local
>>> services either through use of a URN with meaning to the user agent or by
>>> specifying a URL that the user agent recognizes as a local service.
>>> Additionally, the user agent default can be local or remote and can
>>> incorporate end user choices via interfaces provided by the user agent such
>>> as browser configuration parameters.
>>>
>>> volume attribute
>>>   Speaking volume between 0 and 1 inclusive, with 0 being lowest and 1
>>> being highest, with a default of 1.0.  If SSML is used, this value will be
>>> overridden by prosody tags in the markup.
>>>
>>> rate attribute
>>>   Speaking rate relative to the default rate for this voice. 1.0 is the
>>> default rate supported by the speech synthesis engine or specific
>>> voice (which should correspond to a normal speaking rate). 2.0 is twice as
>>> fast, and 0.5 is half as fast. Values below 0.1 or above 10.0 are strictly
>>> disallowed, but speech synthesis engines or specific voices may constrain
>>> the minimum and maximum rates further—for example a particular voice may
>>> not actually speak faster than 3 times normal even if you specify a value
>>> larger than 3.0. If SSML is used, this value will be overridden by prosody
>>> tags in the markup.
>>>
>>> pitch attribute
>>>   Speaking pitch between 0 and 2 inclusive, with 0 being lowest and 2
>>> being highest. 1.0 corresponds to the default pitch of the speech synthesis
>>> engine or specific voice.  Speech synthesis engines or voices may constrain
>>> the minimum and maximum rates further. If SSML is used, this value will be
>>> overridden by prosody tags in the markup.
>>>
>>>
>>> interface SpeechSynthesisVoice {
>>>   readonly attribute DOMString voiceURI;
>>>   readonly attribute DOMString name;
>>>   readonly attribute boolean localService;
>>>   readonly attribute boolean default;
>>> };
>>>
>>> voiceURI attribute:
>>>   The voiceURI attribute specifies the speech synthesis voice and
>>> the location of the speech synthesis service that the web application
>>> wishes to use. If this attribute is unset at the time of the play method
>>> call, then the user agent must use the user agent default speech service.
>>> Note that the serviceURI is a generic URI and can thus point to local
>>> services either through use of a URN with meaning to the user agent or
>>> by specifying a URL that the user agent recognizes as a local service.
>>>
>>> name attribute:
>>>   A human-readable name that represents the voice. There is no guarantee
>>> that all names returned are unique.
>>>
>>> lang attribute:
>>>   This attribute is a valid BCP 47 language tag indicating the language
>>> of the voice.
>>>
>>> localService attribute:
>>>   This attribute is true for voices supplied by a local speech
>>> synthesizer, and is false for voices supplied by a remote speech
>>> synthesizer service.  (This may be useful for the developer because remote
>>> services may imply additional latency, bandwidth or cost, whereas local
>>> voices may imply lower quality, however there is no guarantee that any of
>>> these implications are true.)
>>>
>>> default attribute:
>>>   This attribute is true for at most one voice per language. There may
>>> be a different default for each language. It is user agent dependent
>>> how default voices are determined.
>>>
>>>
>>> interface SpeechSynthesisVoiceList {
>>>   readonly attribute unsigned long length;
>>>   getter SpeechSynthesisVoice item(in unsigned long index);
>>> };
>>>
>>>
>>> interface SpeechSynthesis {
>>>   ...
>>>   static SpeechSynthesisVoiceList getVoices();
>>> };
>>>
>>> getVoices method
>>>   The getVoices method returns the available voices.  It is user agent dependent
>>> which voices are available.
>>>
>>> /Glen Shires
>>>
>>>
>>
>
Received on Friday, 5 October 2012 18:13:59 UTC