- From: Andrew Thompson <lordpixel@mac.com>
- Date: Tue, 21 Jan 2003 20:49:28 -0500
- To: www-voice@w3.org
On Tuesday, Jan 21, 2003, at 04:26 America/New_York, Marc Schroeder
wrote:
>
> Hi,
>
> this is a minor comment regarding the SSML <break> element
> (http://www.w3.org/TR/2002/WD-speech-synthesis-20021202/#S2.2.3), more
> specifically regarding the meaning of the attribute value "none" for
> the time attribute.
Which reminds me to send my comments in!
On the off chance anyone is aware that I'm part of the working group
for JSR 113 (Java Speech API 2.0) I should make this clear that these
are my personal comments, not those of that working group as a whole.
2.1.6 Sub Element
Does the table presented in this section have unintentional duplicates?
If not, it would be helpful to explain the difference between:
"interpret-as: number format: ordinal" and the later
"interpret-as: ordinal"
This seem to be two ways of specifying the same functionality?
2.2.1 Voice Element
name attribute: No whitespace in the name seems overly restrictive -
why not just comma separate the list of names as with font-face is CSS?
The voice names are implementation dependent, therefore if whitespace
is not allowed the SSML implementor will potentially have to map native
voice names to SSML voice names, which seems to make SSML harder to use
for developers (and possibly users).
variant attribute: Variant is defined as an integer. The spec states
"eg, the second or next male child voice" but it does not specify how
to express "next" as an integer. Would this be "+1" for next and "-1"
for previous, or something else?
Relating to this point, in general I have found it useful to be able
to ask for voices like this: "give me an adult male voice, which must
not be the same as the current voice". This can be used to implement
"barge-in" type functionality. It might be worthwhile considering
adding another attribute "exclude", in this fashion
<voice gender="male" age="30" exclude="bruce, fred">
"current" could then be a special voice name:
<voice gender="male" age="30" exclude="current"> - give me any adult
male voice so long as its not the same as the current voice. This
allows one to specify a similar voice in a more natural way than
relying on the proposed "variant" attribute. The value of "variant" is
a simple integer index and would be vendor specific anyway. "Exclude"
would also make sense if a future SSML spec defines some standard voice
names with well known characteristics.
2.2.3 Break element
time attribute: The value of "none" seems troublesome to me, if I read
<break time="none">
in a document, I would assume it meant "do not place a break between
these elements" (break of length 0 seconds).
The spec defines 'The value "none" indicates that a normal break
boundary should be used. The other five values indicate increasingly
large break boundaries between words.'
I'd prefer <break time="default"> for this functionality. It seems more
natural, and is more consistent with usage in 'section 2.2.4 prosody'.
"none" could be retained, and mean "a short (ideally zero length)
break", if the group feels engines can support that.
SEE ALSO: my comment on Appendix A below.
3.3 Pronunciation Lexicon
On the question of element specific lexicons raised in the document, I
note one could use say-as as a limited way of having element specific
pronunciation, eg,
<say-as interpret-as="lexiconKey" lexicon="british.file">tomato</say-as>
Of course, this is is really just another way of achieving what the
<phoneme> element does.
My general concern about element specific lexicons is the processing
cost. eg, assume the document as a whole has a lexicon in use (A), and
a sub element specifies a new lexicon (B). Presumably the synthesis
engine must perform lookups as if (A) and (B) are merged, overriding
pronunciations which occur in A with those in B. It then needs to
unload (B) when the element is exited. This sounds like it could prove
too costly for a handheld device (PDA, Cellphone), and indeed, even a
desktop system might struggle to change lexicon every other word.
At the very least I think this feature would have to be implemented
with no more granularity than per <paragraph> element. <sentence> seems
too fine grained.
Appendix A: Example SSML
The first example has:
<sentence>The step is from Stephanie Williams and arrived at
<break/>3:45</sentence>
The time attribute is optional on <break>, but section 2.2.3 does not
specify what the default value for the "time" attribute is when it is
not specified. If the default value is "none" then the break used is
the normal word break length, which is not what the example above
implies, it implies something longer than a normal break. SEE ALSO my
comment on <break> above.
Thanks!
AndyT (lordpixel - the cat who walks through walls)
A little bigger on the inside
(see you later space cowboy ...)
Received on Tuesday, 21 January 2003 20:49:29 UTC