- From: Glen Shires <gshires@google.com>
- Date: Wed, 3 Oct 2012 15:47:23 -0700
- To: public-speech-api@w3.org
- Message-ID: <CAEE5bchj+FfCEvX00HN1jk2EvL7PWMc8TDCt-M1d2yrdamrEaA@mail.gmail.com>
I propose the following additions to allow enumeration of available voices and to select one, and also to allow selection of volume, rate and pitch. If there's no disagreement, I'll add these to the spec on Friday. interface SpeechSynthesisUtterance { attribute DOMString text; attribute DOMString lang; attribute DOMString voiceURI; attribute double volume; attribute double rate; attribute double pitch; }; text attribute: The text to be synthesized and spoken for this utterance. This may be either plain text or a complete, well-formed SSML document. For speech synthesis engines that do not support SSML, or only support certain tags, the user agent or speech engine must strip away the tags they do not support and speak the text. There may be a maximum length of the text of 32,767 characters. lang attribute: (no change to definition) // Note that serviceURI is renamed to voiceURI, with the same definition except that it also specifies the voice... voiceURI attribute: The voiceURI attribute specifies the speech synthesis voice and the location of the speech synthesis service that the web application wishes to use. If this attribute is unset at the time of the play method call, then the user agent must use the user agent default speech service. Note that the serviceURI is a generic URI and can thus point to local services either through use of a URN with meaning to the user agent or by specifying a URL that the user agent recognizes as a local service. Additionally, the user agent default can be local or remote and can incorporate end user choices via interfaces provided by the user agent such as browser configuration parameters. volume attribute Speaking volume between 0 and 1 inclusive, with 0 being lowest and 1 being highest, with a default of 1.0. If SSML is used, this value will be overridden by prosody tags in the markup. rate attribute Speaking rate relative to the default rate for this voice. 1.0 is the default rate supported by the speech synthesis engine or specific voice (which should correspond to a normal speaking rate). 2.0 is twice as fast, and 0.5 is half as fast. Values below 0.1 or above 10.0 are strictly disallowed, but speech synthesis engines or specific voices may constrain the minimum and maximum rates further—for example a particular voice may not actually speak faster than 3 times normal even if you specify a value larger than 3.0. If SSML is used, this value will be overridden by prosody tags in the markup. pitch attribute Speaking pitch between 0 and 2 inclusive, with 0 being lowest and 2 being highest. 1.0 corresponds to the default pitch of the speech synthesis engine or specific voice. Speech synthesis engines or voices may constrain the minimum and maximum rates further. If SSML is used, this value will be overridden by prosody tags in the markup. interface SpeechSynthesisVoice { readonly attribute DOMString voiceURI; readonly attribute DOMString name; readonly attribute boolean localService; readonly attribute boolean default; }; voiceURI attribute: The voiceURI attribute specifies the speech synthesis voice and the location of the speech synthesis service that the web application wishes to use. If this attribute is unset at the time of the play method call, then the user agent must use the user agent default speech service. Note that the serviceURI is a generic URI and can thus point to local services either through use of a URN with meaning to the user agent or by specifying a URL that the user agent recognizes as a local service. name attribute: A human-readable name that represents the voice. There is no guarantee that all names returned are unique. lang attribute: This attribute is a valid BCP 47 language tag indicating the language of the voice. localService attribute: This attribute is true for voices supplied by a local speech synthesizer, and is false for voices supplied by a remote speech synthesizer service. (This may be useful for the developer because remote services may imply additional latency, bandwidth or cost, whereas local voices may imply lower quality, however there is no guarantee that any of these implications are true.) default attribute: This attribute is true for at most one voice per language. There may be a different default for each language. It is user agent dependent how default voices are determined. interface SpeechSynthesisVoiceList { readonly attribute unsigned long length; getter SpeechSynthesisVoice item(in unsigned long index); }; interface SpeechSynthesis { ... static SpeechSynthesisVoiceList getVoices(); }; getVoices method The getVoices method returns the available voices. It is user agent dependent which voices are available. /Glen Shires
Received on Wednesday, 3 October 2012 22:48:31 UTC