Re: [css3-speech] Editorial Comments from Daniel Weck on 2011-05-24 (www-style@w3.org from May 2011)

From: Daniel Weck <daniel.weck@gmail.com>
Date: Tue, 24 May 2011 12:04:02 +0100
To: www-style list <www-style@w3.org>, timeless <timeless@gmail.com>
Message-Id: <9AE35EE0-2D6A-4CEA-994B-D0787FA916E6@gmail.com>
Thank you for your review!
Reply inline:

On 18 May 2011, at 11:34, timeless wrote:

> http://dev.w3.org/csswg/css3-speech/
>
>> (e.g. TTS voice, pitch, rate, volume levels, etc.)
>
> drop 'etc.' it's incompatible w/ 'e.g.' (and add 'and' before  
> 'volume levels')

Fixed.

>> These style sheet properties can be used together with visual  
>> properties (mixed media), or as a complete aural alternative to  
>> visual presentation.
>
> perhaps 'to a/the visual presentation'?

Good suggestion.

>> This Module describes the CSS properties that apply to the "speech"  
>> media type, and defines a new "box" model specifically for the  
>> aural dimension.
>
> s/Module/module/

Done.

>> Note that content creators can conditionally include CSS properties  
>> dedicated to user-agents with text to speech synthesis
>
> should this be in   <p class=note> ? as is, for some reason you don't
> seem to have margins between <p>'s which makes it look like you just
> have a <br>

Local CSS stylesheet improved.

>> When doing so, the styles authored within the scope of such  
>> conditional statements are ignored by user-agents that do not  
>> support speech synthesis.
>
> s/speech synthesis/css3-speech/ (or "this Module")

Used "this module".

>> linear
>>   When present, this keyword indicates that the associated value  
>> represents a point on a linear volume amplitude scale, from  
>> ‘0’ (silent) to ‘100’ (full volume).
>
>> x-soft
>>   The value ‘x-soft’ maps to 0
>> The interpretation of the corresponding numerical values depends on  
>> whether the ‘linear’ keyword is used
>
> That x-soft might map to silent seems odd.
>
> Initially I wrote:
> I understand the goal is an even distribution, but it seems that a
> value that might represent silent shouldn't be labeled as 'soft', i
> think 10, 30, 50, 70, 90 would be better, either including 'none' and
> 'loudest' for 0/100 or just leaving those values to be written out by
> hand.
>
> I think that you should probably include the explanation you included
> in <non-negative number> about designing for compatibility with SSML.
>> <non-negative number>
>>   An integer or floating point positive number in the range ‘0’ to  
>> ‘100’.
>
> It seems better to call this a <something-percentage>. I don't think
> defining non negative to be bounded above by 100 makes sense.
>
> Of note, you use 'non-negative' here.
>
>> When the ‘linear’ keyword not used
>
> s/not/is not/

The whole volume level issue has been re-worked on based on SSML 1.1  
(this is one of the breaking changes since v1.0). I had made some  
editorial mistakes by combining aspects of SSML 1.0 and details from  
the CSS 2.1 Aural Stylesheets appendix..

> Could you please do something to the style so that two normal <p>'s
> when placed adjacent to each-other have margins? your primary audience
> might be css3-speech users, but..

Already fixed, as per your comment earlier in this email. :)

>> All 3 values are configured by the user
> s/configured/potentially configurable/
>> so this allows authors to write a single style sheet that works in  
>> a variety of listening environments.
> s/so//
>> because it is independent from the user-configured volume levels.
> ? s/from/of/
> -- I'm not sure on this point, my suggestion is because to me you're
> saying that while they could be mathematically related, they aren't
> (thus "of").
> I think "not directly related to" is probably a better solution
>
>> (where ‘x-soft’ always means "silent", etc.).
>
> drop ", etc." ?

Thanks (to all 4 points above).

>> <percentage>
>>   Only positive percentage values are allowed.
>
> I think you want 'non-negative' not 'positive', as '0' is allowed.

Thanks, I checked the entire document for this error.

>> so the computed value equals the inherited value times 0.5 (divided  
>> by 2),
>
> s/divided/i.e. divided/

Ok.

>> (the volume corresponding to ‘0’ is nearer the value of ‘100’)
>> (the gap between ‘0’ and ‘100’ is wider).
>
> i don't think 'nearer' / 'wider' are good choices for this description

"closer to" ? This prose/specification has been completely revamped  
anyway.

>> normal
>> Punctuation is not to be spoken, but instead rendered naturally as  
>> various pauses.
>
> shouldn't punctuation also affect tone, volume, stress, etc.?

Proposed replacement:
"For example, punctuation is not spoken as-is, but instead rendered  
naturally as appropriate pauses."

>> <time>
>> Only positive values are allowed.
>
> s/positive/non-negative/ ?

Fixed everywhere.

>> none
>> Equivalent to 0ms (no prosodic break in the speech output).
>
>> The ‘cue-before’ and ‘cue-after’ properties specify auditory icons  
>> (i.e. prerecorded audio clips) to be played before (or after) the  
>> selected element within the audio "box" model. When a user agent is  
>> not able to render the specified auditory icon, it is recommended  
>> to produce an alternative cue (e.g., popping up a warning, emitting  
>> a warning sound, etc.)
>
> You're missing a period at the end of this paragraph

Yes, and removed "etc." too :)

>>    The URI must designate an auditory icon resource. If the URI  
>> resolves to something other than an audio file, such as an image,  
>> the resource is ignored and the property treated as if it had the  
>> value ‘none’.
>
> must sounds like an rfc term, which is probably not proper in this  
> context.

I see.

>> The loudness of prerecorded audio cues can be adjusted relatively  
>> to the volume level of synthetic speech.
>
> s/relatively/relative/

Yep.

> synthetic or synthesized?
> (possibly "speech synthesis")

This looks alright:

http://www.google.co.uk/search?q=synthetic+speech

>> Only positive percentage values are allowed.
>
> non-negative?

Fixed everywhere.

>> The ‘voice-family’ property specifies a comma-separated,  
>> prioritized list of values that designate speech synthesis voices.
>
> s/voices./voices/ -- otherwise you have a random stray period after
> the parenthetical:

Well spotted.

>> (analog to ‘font-family’ in visual style sheets).
>
> s/analog/analogous/

Done.

>> <name>
>> For compatibility with SSML, whitespace characters are not  
>> permitted within voice names.
>
> This should probably be listed earlier in the paragraph. And it's
> probably better as "voice names must not contain whitespace
> characters".

Good suggestion.

>> <age>
>>   Possible values are ‘child’, ‘young’ and ‘old’.
>
> to me, 'age' is numeric, i'd suggest you use some other thing to
> describe the textual concepts. you're also missing something for
> 'normal'.

To be honest, I am not aware of the historical motivations to use a  
keyword enumeration rather than a non-negative number like in SSML:

http://www.w3.org/TR/speech-synthesis11/#edef_voice

So far I haven't seen any implementation of ‘child’, ‘young’ and  
‘old’, so I am totally in favor of aligning with SSML. Latest editor's  
draft updated accordingly.

>> Possible values are positive numbers restricted to integers, and  
>> excluding zero (i.e. starting from 1).
>
> This is rather convoluted. You defined Positive numbers to include 0
> reference that definition and then actively exclude zero.

Actually we refer to
http://www.w3.org/TR/css3-values/#non-negative

I fixed the erroneous prose that was pointing to "positive" numbers  
when it should have referred to "non-negative" numbers.

>> (e.g. name, gender, age, etc.).
>
> drop "etc."

Yep, as per you earlier recommendation.

>> in order to cater for dialectic variants): .
>
> s/for/to/

Several dictionaries (e.g. the Collins) allow "for" after "cater" to  
express the following meaning: "take into account, consider, bear in  
mind, make allowance for, to supply what is needed, etc."

> s/: ./:/

Keyboard slippage :)

>> If no voice is available for the language of the selected content,  
>> user-agent should raise a warning to let the user know about the  
>> lack of appropriate TTS voice.
>
> While this is a should instead of a must, I'm not certain it's a
> wonderful suggestion. UI design via specification especially in the
> area of warnings is generally poor. I'd suggest 'may'.

Reworded.

>> The speech synthesizer voice must be re-evaluated (i.e. the  
>> selection process must take place once again) whenever either of  
>> the CSS voice characteristics change within the content flow.
>
> s/either/any/

Right.

> I'm concerned by 're-evaluated' + 'when*' -- This document talks about
> a single directed flow, and I'd want UAs to have the option of
> applying the selection process at "layout" instead of at "rendering".
> Otherwise you risk asking a UA to compute something while it's
> reading, creating an unexpectedly long pause between potential voice
> transitions.

I added a note to clarify this point.

>> The voice must also be re-calculated whenever the content language  
>> changes, unless the ‘preserve’ keyword is used
>
> It'd be nice if a css selector based example was provided instead of a
> forced rule on the node.

I added another "span" in the example.

>>  The french text below will be spoken with an english voice:
>
> s/french/French/; s/english/English speaker's/

Fixed.

>> 8.3. The ‘voice-pitch’ property
>> Value: 	<frequency> | <percentage> | <relative-change> | x-low |  
>> low | medium | high | x-high | inherit
>> <relative-change>
>>   Specifies a relative change (decrement or increment) to the  
>> inherited value. The syntax of allowed values is a <number> (the  
>> "+" sign is optional for positive numbers), followed by either of  
>> "Hz" (for Hertz) or "kHz" (for kiloHertz) or "st" (for semitones),  
>> and followed by a space character and the "relative" keyword.
>
> It seems like:
>
> | <relative-value> relative |
>
> would be much easier to understand than an extra sentence hidden at
> the end of the text.

Yes, this was already on my todo list. Actually:
| <relative-value> && relative |

Many thanks !!
Dan
Received on Tuesday, 24 May 2011 11:04:36 UTC