SpeechSynthesisUtterance volume, rate, pitch and voice selection

I propose the following additions to allow enumeration of available voices
and to select one, and also to allow selection of volume, rate and pitch.
 If there's no disagreement, I'll add these to the spec on Friday.

interface SpeechSynthesisUtterance {
  attribute DOMString text;
  attribute DOMString lang;
  attribute DOMString voiceURI;
  attribute double volume;
  attribute double rate;
  attribute double pitch;
};

text attribute:
  The text to be synthesized and spoken for this utterance. This may be
either plain text or a complete, well-formed SSML document. For
speech synthesis engines that do not support SSML, or only support certain
tags, the user agent or speech engine must strip away the tags they do
not support and speak the text. There may be a maximum length of the text
of 32,767 characters.

lang attribute:
(no change to definition)

// Note that serviceURI is renamed to voiceURI, with the same definition
except that it also specifies the voice...

voiceURI attribute:
  The voiceURI attribute specifies the speech synthesis voice and
the location of the speech synthesis service that the web application
wishes to use. If this attribute is unset at the time of the play method
call, then the user agent must use the user agent default speech service.
Note that the serviceURI is a generic URI and can thus point to local
services either through use of a URN with meaning to the user agent or by
specifying a URL that the user agent recognizes as a local service.
Additionally, the user agent default can be local or remote and can
incorporate end user choices via interfaces provided by the user agent such
as browser configuration parameters.

volume attribute
  Speaking volume between 0 and 1 inclusive, with 0 being lowest and 1
being highest, with a default of 1.0.  If SSML is used, this value will be
overridden by prosody tags in the markup.

rate attribute
  Speaking rate relative to the default rate for this voice. 1.0 is the
default rate supported by the speech synthesis engine or specific
voice (which should correspond to a normal speaking rate). 2.0 is twice as
fast, and 0.5 is half as fast. Values below 0.1 or above 10.0 are strictly
disallowed, but speech synthesis engines or specific voices may constrain
the minimum and maximum rates further—for example a particular voice may
not actually speak faster than 3 times normal even if you specify a value
larger than 3.0. If SSML is used, this value will be overridden by prosody
tags in the markup.

pitch attribute
  Speaking pitch between 0 and 2 inclusive, with 0 being lowest and 2 being
highest. 1.0 corresponds to the default pitch of the speech synthesis
engine or specific voice.  Speech synthesis engines or voices may constrain
the minimum and maximum rates further. If SSML is used, this value will be
overridden by prosody tags in the markup.


interface SpeechSynthesisVoice {
  readonly attribute DOMString voiceURI;
  readonly attribute DOMString name;
  readonly attribute boolean localService;
  readonly attribute boolean default;
};

voiceURI attribute:
  The voiceURI attribute specifies the speech synthesis voice and
the location of the speech synthesis service that the web application
wishes to use. If this attribute is unset at the time of the play method
call, then the user agent must use the user agent default speech service.
Note that the serviceURI is a generic URI and can thus point to local
services either through use of a URN with meaning to the user agent or by
specifying a URL that the user agent recognizes as a local service.

name attribute:
  A human-readable name that represents the voice. There is no guarantee
that all names returned are unique.

lang attribute:
  This attribute is a valid BCP 47 language tag indicating the language of
the voice.

localService attribute:
  This attribute is true for voices supplied by a local speech synthesizer,
and is false for voices supplied by a remote speech synthesizer service.
 (This may be useful for the developer because remote services may imply
additional latency, bandwidth or cost, whereas local voices may imply lower
quality, however there is no guarantee that any of these implications are
true.)

default attribute:
  This attribute is true for at most one voice per language. There may be a
different default for each language. It is user agent dependent how default
voices are determined.


interface SpeechSynthesisVoiceList {
  readonly attribute unsigned long length;
  getter SpeechSynthesisVoice item(in unsigned long index);
};


interface SpeechSynthesis {
  ...
  static SpeechSynthesisVoiceList getVoices();
};

getVoices method
  The getVoices method returns the available voices.  It is user agent
dependent
which voices are available.

/Glen Shires

Received on Wednesday, 3 October 2012 22:48:31 UTC