W3C home > Mailing lists > Public > www-style@w3.org > May 2011

Re: [css3-speech] Editorial Comments

From: Daniel Weck <daniel.weck@gmail.com>
Date: Tue, 24 May 2011 12:04:02 +0100
Message-Id: <9AE35EE0-2D6A-4CEA-994B-D0787FA916E6@gmail.com>
To: www-style list <www-style@w3.org>, timeless <timeless@gmail.com>
Thank you for your review!
Reply inline:

On 18 May 2011, at 11:34, timeless wrote:

> http://dev.w3.org/csswg/css3-speech/
>> (e.g. TTS voice, pitch, rate, volume levels, etc.)
> drop 'etc.' it's incompatible w/ 'e.g.' (and add 'and' before  
> 'volume levels')


>> These style sheet properties can be used together with visual  
>> properties (mixed media), or as a complete aural alternative to  
>> visual presentation.
> perhaps 'to a/the visual presentation'?

Good suggestion.

>> This Module describes the CSS properties that apply to the "speech"  
>> media type, and defines a new "box" model specifically for the  
>> aural dimension.
> s/Module/module/


>> Note that content creators can conditionally include CSS properties  
>> dedicated to user-agents with text to speech synthesis
> should this be in   <p class=note> ? as is, for some reason you don't
> seem to have margins between <p>'s which makes it look like you just
> have a <br>

Local CSS stylesheet improved.

>> When doing so, the styles authored within the scope of such  
>> conditional statements are ignored by user-agents that do not  
>> support speech synthesis.
> s/speech synthesis/css3-speech/ (or "this Module")

Used "this module".

>> linear
>>   When present, this keyword indicates that the associated value  
>> represents a point on a linear volume amplitude scale, from  
>> ‘0’ (silent) to ‘100’ (full volume).
>> x-soft
>>   The value ‘x-soft’ maps to 0
>> The interpretation of the corresponding numerical values depends on  
>> whether the ‘linear’ keyword is used
> That x-soft might map to silent seems odd.
> Initially I wrote:
> I understand the goal is an even distribution, but it seems that a
> value that might represent silent shouldn't be labeled as 'soft', i
> think 10, 30, 50, 70, 90 would be better, either including 'none' and
> 'loudest' for 0/100 or just leaving those values to be written out by
> hand.
> I think that you should probably include the explanation you included
> in <non-negative number> about designing for compatibility with SSML.
>> <non-negative number>
>>   An integer or floating point positive number in the range ‘0’ to  
>> ‘100’.
> It seems better to call this a <something-percentage>. I don't think
> defining non negative to be bounded above by 100 makes sense.
> Of note, you use 'non-negative' here.
>> When the ‘linear’ keyword not used
> s/not/is not/

The whole volume level issue has been re-worked on based on SSML 1.1  
(this is one of the breaking changes since v1.0). I had made some  
editorial mistakes by combining aspects of SSML 1.0 and details from  
the CSS 2.1 Aural Stylesheets appendix..

> Could you please do something to the style so that two normal <p>'s
> when placed adjacent to each-other have margins? your primary audience
> might be css3-speech users, but..

Already fixed, as per your comment earlier in this email. :)

>> All 3 values are configured by the user
> s/configured/potentially configurable/
>> so this allows authors to write a single style sheet that works in  
>> a variety of listening environments.
> s/so//
>> because it is independent from the user-configured volume levels.
> ? s/from/of/
> -- I'm not sure on this point, my suggestion is because to me you're
> saying that while they could be mathematically related, they aren't
> (thus "of").
> I think "not directly related to" is probably a better solution
>> (where ‘x-soft’ always means "silent", etc.).
> drop ", etc." ?

Thanks (to all 4 points above).

>> <percentage>
>>   Only positive percentage values are allowed.
> I think you want 'non-negative' not 'positive', as '0' is allowed.

Thanks, I checked the entire document for this error.

>> so the computed value equals the inherited value times 0.5 (divided  
>> by 2),
> s/divided/i.e. divided/


>> (the volume corresponding to ‘0’ is nearer the value of ‘100’)
>> (the gap between ‘0’ and ‘100’ is wider).
> i don't think 'nearer' / 'wider' are good choices for this description

"closer to" ? This prose/specification has been completely revamped  

>> normal
>> Punctuation is not to be spoken, but instead rendered naturally as  
>> various pauses.
> shouldn't punctuation also affect tone, volume, stress, etc.?

Proposed replacement:
"For example, punctuation is not spoken as-is, but instead rendered  
naturally as appropriate pauses."

>> <time>
>> Only positive values are allowed.
> s/positive/non-negative/ ?

Fixed everywhere.

>> none
>> Equivalent to 0ms (no prosodic break in the speech output).
>> The ‘cue-before’ and ‘cue-after’ properties specify auditory icons  
>> (i.e. prerecorded audio clips) to be played before (or after) the  
>> selected element within the audio "box" model. When a user agent is  
>> not able to render the specified auditory icon, it is recommended  
>> to produce an alternative cue (e.g., popping up a warning, emitting  
>> a warning sound, etc.)
> You're missing a period at the end of this paragraph

Yes, and removed "etc." too :)

>>    The URI must designate an auditory icon resource. If the URI  
>> resolves to something other than an audio file, such as an image,  
>> the resource is ignored and the property treated as if it had the  
>> value ‘none’.
> must sounds like an rfc term, which is probably not proper in this  
> context.

I see.

>> The loudness of prerecorded audio cues can be adjusted relatively  
>> to the volume level of synthetic speech.
> s/relatively/relative/


> synthetic or synthesized?
> (possibly "speech synthesis")

This looks alright:


>> Only positive percentage values are allowed.
> non-negative?

Fixed everywhere.

>> The ‘voice-family’ property specifies a comma-separated,  
>> prioritized list of values that designate speech synthesis voices.
> s/voices./voices/ -- otherwise you have a random stray period after
> the parenthetical:

Well spotted.

>> (analog to ‘font-family’ in visual style sheets).
> s/analog/analogous/


>> <name>
>> For compatibility with SSML, whitespace characters are not  
>> permitted within voice names.
> This should probably be listed earlier in the paragraph. And it's
> probably better as "voice names must not contain whitespace
> characters".

Good suggestion.

>> <age>
>>   Possible values are ‘child’, ‘young’ and ‘old’.
> to me, 'age' is numeric, i'd suggest you use some other thing to
> describe the textual concepts. you're also missing something for
> 'normal'.

To be honest, I am not aware of the historical motivations to use a  
keyword enumeration rather than a non-negative number like in SSML:


So far I haven't seen any implementation of ‘child’, ‘young’ and  
‘old’, so I am totally in favor of aligning with SSML. Latest editor's  
draft updated accordingly.

>> Possible values are positive numbers restricted to integers, and  
>> excluding zero (i.e. starting from 1).
> This is rather convoluted. You defined Positive numbers to include 0
> reference that definition and then actively exclude zero.

Actually we refer to

I fixed the erroneous prose that was pointing to "positive" numbers  
when it should have referred to "non-negative" numbers.

>> (e.g. name, gender, age, etc.).
> drop "etc."

Yep, as per you earlier recommendation.

>> in order to cater for dialectic variants): .
> s/for/to/

Several dictionaries (e.g. the Collins) allow "for" after "cater" to  
express the following meaning: "take into account, consider, bear in  
mind, make allowance for, to supply what is needed, etc."

> s/: ./:/

Keyboard slippage :)

>> If no voice is available for the language of the selected content,  
>> user-agent should raise a warning to let the user know about the  
>> lack of appropriate TTS voice.
> While this is a should instead of a must, I'm not certain it's a
> wonderful suggestion. UI design via specification especially in the
> area of warnings is generally poor. I'd suggest 'may'.


>> The speech synthesizer voice must be re-evaluated (i.e. the  
>> selection process must take place once again) whenever either of  
>> the CSS voice characteristics change within the content flow.
> s/either/any/


> I'm concerned by 're-evaluated' + 'when*' -- This document talks about
> a single directed flow, and I'd want UAs to have the option of
> applying the selection process at "layout" instead of at "rendering".
> Otherwise you risk asking a UA to compute something while it's
> reading, creating an unexpectedly long pause between potential voice
> transitions.

I added a note to clarify this point.

>> The voice must also be re-calculated whenever the content language  
>> changes, unless the ‘preserve’ keyword is used
> It'd be nice if a css selector based example was provided instead of a
> forced rule on the node.

I added another "span" in the example.

>>  The french text below will be spoken with an english voice:
> s/french/French/; s/english/English speaker's/


>> 8.3. The ‘voice-pitch’ property
>> Value: 	<frequency> | <percentage> | <relative-change> | x-low |  
>> low | medium | high | x-high | inherit
>> <relative-change>
>>   Specifies a relative change (decrement or increment) to the  
>> inherited value. The syntax of allowed values is a <number> (the  
>> "+" sign is optional for positive numbers), followed by either of  
>> "Hz" (for Hertz) or "kHz" (for kiloHertz) or "st" (for semitones),  
>> and followed by a space character and the "relative" keyword.
> It seems like:
> | <relative-value> relative |
> would be much easier to understand than an extra sentence hidden at
> the end of the text.

Yes, this was already on my todo list. Actually:
| <relative-value> && relative |

Many thanks !!
Received on Tuesday, 24 May 2011 11:04:36 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:38:46 UTC