- From: Andrew Thompson <lordpixel@mac.com>
- Date: Tue, 10 Aug 2004 00:17:14 -0400
- To: www style <www-style@w3.org>
- Cc: Dave Raggett <dsr@w3.org>
Hi, I've reviewed the 2004 Draft of the CSS3 Speech Module. I previously submitted comments on the 2003 draft here: http://lists.w3.org/Archives/Public/www-style/2003Jun/0137.html These comments never received any formal response from the working group, but I see the 2004 draft has addressed around 50% of my issues with the previous draft, so I'm pleased with the direction being taken. Here are my comments on the 2004 Draft, split into comments on style or grammar and comments on the substance of the spec. Grammatical & Style Comments ---------------------------- 1. Section: Abstract Problem: typo The sentence "CSS define aural properties that ..." should be The sentence "CSS defines aural properties that ..." 2. Section: Definition of property 'speak' Problem: English usage In the definitions of 'literal-punctuation' and 'no-punctuation' the sentence "Similar as 'normal' value but..." should be "Similar to 'normal' value but..." 3. Section: Definition of property 'speak' Problem: English usage The sentence: "Speech synthesizers are knowledgeable about what is a number and what isn't." "Speech synthesizers are knowledgeable about what is and is not a number." Should not use 'isn't' in formal written English. 4. Section: Definition of the property 'voice-duration' This sentence is poor: "This allows authors to specify how long they want a given element to be rendered." ("how long they want" seems like it is plural purely to avoid the he/she problem, and the phrasing is basically slang) Perhaps something like "Allows authors to specify how long it should take to render the given element." Substantive Comments -------------------- 1. Section: Definition of the property 'speak' This draft of the spec - http://www.w3.org/TR/2002/WD-speech-synthesis-20021202/ - defined two additional properties, 'date' and 'words'. The later is probably only marginally useful (in theory it was supposed to force 'ASCII' to be rendered as "as-key" rather than "a s c i i") but I'm really surprised at the removal of "date" which would seem to be really useful. 2. Section: Definition of the properties 'cue-before' and 'cue-after' None of the current examples make it clear that this is legal: cue-before: url('bell.aiff') loud; While grammar shows this is possible, an example would help the less technical reader understand how this property works. (I was going to make a comment about "cue-during" and mixing, but the recent discussion of a CSS audio module on www-style indicates this possibility is already being considered.) 3. Section: Definitions of the properties, 'mark-before' and 'mark-after' in both cases the definition reads: Value: <string> but it should be Value: <string> | attr(attribute-name) To match the description below it. 4. Section: Definition of the property 'voice-family' 4.1. CSS3 is still using 'child', 'young' and 'old' but SSML has shifted to requiring age to be expressed in years. (see http://www.w3.org/TR/speech-synthesis/#S3.2.1) One suspects the reason SSML did this was to avoid the political correctness issue of having to define an age which is "old". 'child', 'young' and 'old' are more useful than absolute numbers. Should CSS harmonize with SSML and only use numbers, or at least allow age to be defined in numbers in addition to child/young/old for compatability? 4.2. In addition to 'male' and 'female' the <generic-voice> families should include 'natural' and 'artificial' as many synthesizers have a robot-like voice that is neither male nor female. Note that SSML defines 'neutral' so as a minimum this should be added for compatibility. 4.3. As per my 2003 comments, although I like the fact there is a facility for selecting variations, using <number> for specifying then is not a satisfactory solution. * firstly using absolute numbers is not very portable. If I write body { voice-family: male 1 } .foo { voice-family: male 2 } .bar { voice-family: male 3 } Then what happens if the synthesizer only has two male voices? When something of class 'bar' is rendered, does the synthesizer round-robin back to "male 1" or does it stay with the current voice because it doesn't have enough male voices? At the very least the specification should specify what "best effort" strategy the synthesizer should apply. This allows document authors to at least predict whether the voice will change or not (assuming the synthesizer has at least 2 voices). * The definition for <number> says: "e.g. the second or next male voice", but no way to indicate "next" and "previous" is given. Clearly '1', '2', '3' work for specifying variants absolutely, put how do ask for the next voice? Perhaps something like this .foo { voice-family: male +1 //select the next male voice, relative to the inherited voice} However this would be easier: Value: [[<specific-voice> | [<relative-voice-specifier>] [<age>] <generic-voice>],]* [<specific-voice> | [<relative-voice-specifier>] [<age>] <generic-voice>] | inherit <relative-voice-specifier> Possible values are 'previous' and 'next' .foo { voice-family: next old male } This would require vendors order their voices, but it would allow document authors to reliably control whether the voice changes. eg, Assume a synthesizer has 3 male voices "Fred", "Bruce" and "Ralph" <ul> <li>one</li> <li><ul><li>foo</li> <li>bar</li> </ul> </li> </ul> ul { voice-family: male; } --> Fred ul ul { voice-family: next male; } --> Bruce ul ul ul { voice-family: previous male; } --> Fred * Along similar lines, another value would be useful: <relative-voice-specifier> Possible values are 'previous', 'next' and 'different' ul { voice-family: young female; } //slightly silly example, you probably wouldn't change the voice for 'em' em { voice-family: different female; } 'different' is similar to 'previous' and 'next' but rather than cycling through the voices in a set order it asks the synthesizer to change the voice. The new voice should be as close to the inherited value as possible, within the constraints of the available voices. Thus the above 'em' declaration should first try to use a different 'young female' voice, then a different 'female' voice, then a 'neuter' and finally a 'male' voice if the synthesizer only has one female voice. Naturally all of these voices must speak the same language first and foremost! Overall I believe something like 'previous', 'next' and 'different' would be more useful, more intuitive and more portable than absolute integer indices. 5. Section: Definition of 'voice-pitch' Regarding semitone changes: I think CSS should be harmonized with SSML unless adding the new unit to CSS is undesirable for some reason? Thanks for your time. Be interested in hearing any feedback. AndyT (lordpixel - the cat who walks through walls) A little bigger on the inside (see you later space cowboy ...)
Received on Tuesday, 10 August 2004 04:17:18 UTC