- From: Andrew Thompson <lordpixel@mac.com>
- Date: Tue, 10 Aug 2004 00:17:14 -0400
- To: www style <www-style@w3.org>
- Cc: Dave Raggett <dsr@w3.org>
Hi,
I've reviewed the 2004 Draft of the CSS3 Speech Module. I previously
submitted comments on the 2003 draft here:
http://lists.w3.org/Archives/Public/www-style/2003Jun/0137.html
These comments never received any formal response from the working
group, but I see the 2004 draft has addressed around 50% of my issues
with the previous draft, so I'm pleased with the direction being taken.
Here are my comments on the 2004 Draft, split into comments on style or
grammar and comments on the substance of the spec.
Grammatical & Style Comments
----------------------------
1.
Section: Abstract
Problem: typo
The sentence "CSS define aural properties that ..." should be
The sentence "CSS defines aural properties that ..."
2.
Section: Definition of property 'speak'
Problem: English usage
In the definitions of 'literal-punctuation' and 'no-punctuation' the
sentence
"Similar as 'normal' value but..." should be
"Similar to 'normal' value but..."
3.
Section: Definition of property 'speak'
Problem: English usage
The sentence:
"Speech synthesizers are knowledgeable about what is a number and what
isn't."
"Speech synthesizers are knowledgeable about what is and is not a
number."
Should not use 'isn't' in formal written English.
4.
Section: Definition of the property 'voice-duration'
This sentence is poor:
"This allows authors to specify how long they want a given element to
be rendered."
("how long they want" seems like it is plural purely to avoid the
he/she problem, and the phrasing is basically slang)
Perhaps something like
"Allows authors to specify how long it should take to render the given
element."
Substantive Comments
--------------------
1.
Section: Definition of the property 'speak'
This draft of the spec -
http://www.w3.org/TR/2002/WD-speech-synthesis-20021202/ - defined two
additional properties, 'date' and 'words'. The later is probably only
marginally useful (in theory it was supposed to force 'ASCII' to be
rendered as "as-key" rather than "a s c i i") but I'm really surprised
at the removal of "date" which would seem to be really useful.
2.
Section: Definition of the properties 'cue-before' and 'cue-after'
None of the current examples make it clear that this is legal:
cue-before: url('bell.aiff') loud;
While grammar shows this is possible, an example would help the less
technical reader understand how this property works.
(I was going to make a comment about "cue-during" and mixing, but the
recent discussion of a CSS audio module on www-style indicates this
possibility is already being considered.)
3.
Section: Definitions of the properties, 'mark-before' and 'mark-after'
in both cases the definition reads:
Value: <string>
but it should be
Value: <string> | attr(attribute-name)
To match the description below it.
4.
Section: Definition of the property 'voice-family'
4.1. CSS3 is still using 'child', 'young' and 'old' but SSML has
shifted to requiring age to be expressed in years.
(see http://www.w3.org/TR/speech-synthesis/#S3.2.1)
One suspects the reason SSML did this was to avoid the political
correctness issue of having to define an age which is "old". 'child',
'young' and 'old' are more useful than absolute numbers. Should CSS
harmonize with SSML and only use numbers, or at least allow age to be
defined in numbers in addition to child/young/old for compatability?
4.2. In addition to 'male' and 'female' the <generic-voice> families
should include 'natural' and 'artificial' as many synthesizers have a
robot-like voice that is neither male nor female. Note that SSML
defines 'neutral' so as a minimum this should be added for
compatibility.
4.3. As per my 2003 comments, although I like the fact there is a
facility for selecting variations, using <number> for specifying then
is not a satisfactory solution.
* firstly using absolute numbers is not very portable. If I write
body { voice-family: male 1 }
.foo { voice-family: male 2 }
.bar { voice-family: male 3 }
Then what happens if the synthesizer only has two male voices? When
something of class 'bar' is rendered, does the synthesizer round-robin
back to "male 1" or does it stay with the current voice because it
doesn't have enough male voices? At the very least the specification
should specify what "best effort" strategy the synthesizer should
apply. This allows document authors to at least predict whether the
voice will change or not (assuming the synthesizer has at least 2
voices).
* The definition for <number> says: "e.g. the second or next male
voice", but no way to indicate "next" and "previous" is given. Clearly
'1', '2', '3' work for specifying variants absolutely, put how do ask
for the next voice? Perhaps something like this
.foo { voice-family: male +1 //select the next male voice, relative to
the inherited voice}
However this would be easier:
Value: [[<specific-voice> | [<relative-voice-specifier>] [<age>]
<generic-voice>],]*
[<specific-voice> | [<relative-voice-specifier>] [<age>]
<generic-voice>] | inherit
<relative-voice-specifier>
Possible values are 'previous' and 'next'
.foo { voice-family: next old male }
This would require vendors order their voices, but it would allow
document authors to reliably control whether the voice changes.
eg, Assume a synthesizer has 3 male voices "Fred", "Bruce" and "Ralph"
<ul>
<li>one</li>
<li><ul><li>foo</li>
<li>bar</li>
</ul>
</li>
</ul>
ul { voice-family: male; } --> Fred
ul ul { voice-family: next male; } --> Bruce
ul ul ul { voice-family: previous male; } --> Fred
* Along similar lines, another value would be useful:
<relative-voice-specifier>
Possible values are 'previous', 'next' and 'different'
ul { voice-family: young female; }
//slightly silly example, you probably wouldn't change the voice for
'em'
em { voice-family: different female; }
'different' is similar to 'previous' and 'next' but rather than cycling
through the voices in a set order it asks the synthesizer to change the
voice. The new voice should be as close to the inherited value as
possible, within the constraints of the available voices. Thus the
above 'em' declaration should first try to use a different 'young
female' voice, then a different 'female' voice, then a 'neuter' and
finally a 'male' voice if the synthesizer only has one female voice.
Naturally all of these voices must speak the same language first and
foremost!
Overall I believe something like 'previous', 'next' and 'different'
would be more useful, more intuitive and more portable than absolute
integer indices.
5.
Section: Definition of 'voice-pitch'
Regarding semitone changes: I think CSS should be harmonized with SSML
unless adding the new unit to CSS is undesirable for some reason?
Thanks for your time. Be interested in hearing any feedback.
AndyT (lordpixel - the cat who walks through walls)
A little bigger on the inside
(see you later space cowboy ...)
Received on Tuesday, 10 August 2004 04:17:18 UTC