- From: Andrew Thompson <lordpixel@mac.com>
- Date: Tue, 21 Jan 2003 20:49:28 -0500
- To: www-voice@w3.org
On Tuesday, Jan 21, 2003, at 04:26 America/New_York, Marc Schroeder wrote: > > Hi, > > this is a minor comment regarding the SSML <break> element > (http://www.w3.org/TR/2002/WD-speech-synthesis-20021202/#S2.2.3), more > specifically regarding the meaning of the attribute value "none" for > the time attribute. Which reminds me to send my comments in! On the off chance anyone is aware that I'm part of the working group for JSR 113 (Java Speech API 2.0) I should make this clear that these are my personal comments, not those of that working group as a whole. 2.1.6 Sub Element Does the table presented in this section have unintentional duplicates? If not, it would be helpful to explain the difference between: "interpret-as: number format: ordinal" and the later "interpret-as: ordinal" This seem to be two ways of specifying the same functionality? 2.2.1 Voice Element name attribute: No whitespace in the name seems overly restrictive - why not just comma separate the list of names as with font-face is CSS? The voice names are implementation dependent, therefore if whitespace is not allowed the SSML implementor will potentially have to map native voice names to SSML voice names, which seems to make SSML harder to use for developers (and possibly users). variant attribute: Variant is defined as an integer. The spec states "eg, the second or next male child voice" but it does not specify how to express "next" as an integer. Would this be "+1" for next and "-1" for previous, or something else? Relating to this point, in general I have found it useful to be able to ask for voices like this: "give me an adult male voice, which must not be the same as the current voice". This can be used to implement "barge-in" type functionality. It might be worthwhile considering adding another attribute "exclude", in this fashion <voice gender="male" age="30" exclude="bruce, fred"> "current" could then be a special voice name: <voice gender="male" age="30" exclude="current"> - give me any adult male voice so long as its not the same as the current voice. This allows one to specify a similar voice in a more natural way than relying on the proposed "variant" attribute. The value of "variant" is a simple integer index and would be vendor specific anyway. "Exclude" would also make sense if a future SSML spec defines some standard voice names with well known characteristics. 2.2.3 Break element time attribute: The value of "none" seems troublesome to me, if I read <break time="none"> in a document, I would assume it meant "do not place a break between these elements" (break of length 0 seconds). The spec defines 'The value "none" indicates that a normal break boundary should be used. The other five values indicate increasingly large break boundaries between words.' I'd prefer <break time="default"> for this functionality. It seems more natural, and is more consistent with usage in 'section 2.2.4 prosody'. "none" could be retained, and mean "a short (ideally zero length) break", if the group feels engines can support that. SEE ALSO: my comment on Appendix A below. 3.3 Pronunciation Lexicon On the question of element specific lexicons raised in the document, I note one could use say-as as a limited way of having element specific pronunciation, eg, <say-as interpret-as="lexiconKey" lexicon="british.file">tomato</say-as> Of course, this is is really just another way of achieving what the <phoneme> element does. My general concern about element specific lexicons is the processing cost. eg, assume the document as a whole has a lexicon in use (A), and a sub element specifies a new lexicon (B). Presumably the synthesis engine must perform lookups as if (A) and (B) are merged, overriding pronunciations which occur in A with those in B. It then needs to unload (B) when the element is exited. This sounds like it could prove too costly for a handheld device (PDA, Cellphone), and indeed, even a desktop system might struggle to change lexicon every other word. At the very least I think this feature would have to be implemented with no more granularity than per <paragraph> element. <sentence> seems too fine grained. Appendix A: Example SSML The first example has: <sentence>The step is from Stephanie Williams and arrived at <break/>3:45</sentence> The time attribute is optional on <break>, but section 2.2.3 does not specify what the default value for the "time" attribute is when it is not specified. If the default value is "none" then the break used is the normal word break length, which is not what the example above implies, it implies something longer than a normal break. SEE ALSO my comment on <break> above. Thanks! AndyT (lordpixel - the cat who walks through walls) A little bigger on the inside (see you later space cowboy ...)
Received on Tuesday, 21 January 2003 20:49:29 UTC