- From: Dominic Mazzoni <dmazzoni@google.com>
- Date: Wed, 3 Oct 2012 15:53:17 -0700
- To: Glen Shires <gshires@google.com>
- Cc: public-speech-api@w3.org
- Message-ID: <CAFz-FYyzpo_8yygypO-Udd5o70qbSsTs3n1_1oUtpFXcSvx1wQ@mail.gmail.com>
Looks good, thanks for including this. I think you included serviceURI where you meant voiceURI, but otherwise fine. - Dominic On Wed, Oct 3, 2012 at 3:47 PM, Glen Shires <gshires@google.com> wrote: > I propose the following additions to allow enumeration of available voices > and to select one, and also to allow selection of volume, rate and pitch. > If there's no disagreement, I'll add these to the spec on Friday. > > interface SpeechSynthesisUtterance { > attribute DOMString text; > attribute DOMString lang; > attribute DOMString voiceURI; > attribute double volume; > attribute double rate; > attribute double pitch; > }; > > text attribute: > The text to be synthesized and spoken for this utterance. This may be > either plain text or a complete, well-formed SSML document. For > speech synthesis engines that do not support SSML, or only support certain > tags, the user agent or speech engine must strip away the tags they do > not support and speak the text. There may be a maximum length of the text > of 32,767 characters. > > lang attribute: > (no change to definition) > > // Note that serviceURI is renamed to voiceURI, with the same definition > except that it also specifies the voice... > > voiceURI attribute: > The voiceURI attribute specifies the speech synthesis voice and > the location of the speech synthesis service that the web application > wishes to use. If this attribute is unset at the time of the play method > call, then the user agent must use the user agent default speech service. > Note that the serviceURI is a generic URI and can thus point to local > services either through use of a URN with meaning to the user agent or by > specifying a URL that the user agent recognizes as a local service. > Additionally, the user agent default can be local or remote and can > incorporate end user choices via interfaces provided by the user agent such > as browser configuration parameters. > > volume attribute > Speaking volume between 0 and 1 inclusive, with 0 being lowest and 1 > being highest, with a default of 1.0. If SSML is used, this value will be > overridden by prosody tags in the markup. > > rate attribute > Speaking rate relative to the default rate for this voice. 1.0 is the > default rate supported by the speech synthesis engine or specific > voice (which should correspond to a normal speaking rate). 2.0 is twice as > fast, and 0.5 is half as fast. Values below 0.1 or above 10.0 are strictly > disallowed, but speech synthesis engines or specific voices may constrain > the minimum and maximum rates further—for example a particular voice may > not actually speak faster than 3 times normal even if you specify a value > larger than 3.0. If SSML is used, this value will be overridden by prosody > tags in the markup. > > pitch attribute > Speaking pitch between 0 and 2 inclusive, with 0 being lowest and 2 > being highest. 1.0 corresponds to the default pitch of the speech synthesis > engine or specific voice. Speech synthesis engines or voices may constrain > the minimum and maximum rates further. If SSML is used, this value will be > overridden by prosody tags in the markup. > > > interface SpeechSynthesisVoice { > readonly attribute DOMString voiceURI; > readonly attribute DOMString name; > readonly attribute boolean localService; > readonly attribute boolean default; > }; > > voiceURI attribute: > The voiceURI attribute specifies the speech synthesis voice and > the location of the speech synthesis service that the web application > wishes to use. If this attribute is unset at the time of the play method > call, then the user agent must use the user agent default speech service. > Note that the serviceURI is a generic URI and can thus point to local > services either through use of a URN with meaning to the user agent or by > specifying a URL that the user agent recognizes as a local service. > > name attribute: > A human-readable name that represents the voice. There is no guarantee > that all names returned are unique. > > lang attribute: > This attribute is a valid BCP 47 language tag indicating the language of > the voice. > > localService attribute: > This attribute is true for voices supplied by a local speech > synthesizer, and is false for voices supplied by a remote speech > synthesizer service. (This may be useful for the developer because remote > services may imply additional latency, bandwidth or cost, whereas local > voices may imply lower quality, however there is no guarantee that any of > these implications are true.) > > default attribute: > This attribute is true for at most one voice per language. There may be > a different default for each language. It is user agent dependent how > default voices are determined. > > > interface SpeechSynthesisVoiceList { > readonly attribute unsigned long length; > getter SpeechSynthesisVoice item(in unsigned long index); > }; > > > interface SpeechSynthesis { > ... > static SpeechSynthesisVoiceList getVoices(); > }; > > getVoices method > The getVoices method returns the available voices. It is user agent dependent > which voices are available. > > /Glen Shires > >
Received on Wednesday, 3 October 2012 22:53:45 UTC