[css3-speech] Editorial Comments from timeless on 2011-05-18 (www-style@w3.org from May 2011)

From: timeless <timeless@gmail.com>
Date: Wed, 18 May 2011 13:34:00 +0300
To: www-style list <www-style@w3.org>
Message-ID: <BANLkTini7VzwiDt7qtzoYhSoFe6qiwZvvA@mail.gmail.com>
http://dev.w3.org/csswg/css3-speech/

> (e.g. TTS voice, pitch, rate, volume levels, etc.)

drop 'etc.' it's incompatible w/ 'e.g.' (and add 'and' before 'volume levels')

> These style sheet properties can be used together with visual properties (mixed media), or as a complete aural alternative to visual presentation.

perhaps 'to a/the visual presentation'?

> This Module describes the CSS properties that apply to the "speech" media type, and defines a new "box" model specifically for the aural dimension.

s/Module/module/

> Note that content creators can conditionally include CSS properties dedicated to user-agents with text to speech synthesis

should this be in   <p class=note> ? as is, for some reason you don't
seem to have margins between <p>'s which makes it look like you just
have a <br>

> When doing so, the styles authored within the scope of such conditional statements are ignored by user-agents that do not support speech synthesis.

s/speech synthesis/css3-speech/ (or "this Module")

> linear
>    When present, this keyword indicates that the associated value represents a point on a linear volume amplitude scale, from ‘0’ (silent) to ‘100’ (full volume).

> x-soft
>    The value ‘x-soft’ maps to 0
> The interpretation of the corresponding numerical values depends on whether the ‘linear’ keyword is used

That x-soft might map to silent seems odd.

Initially I wrote:
I understand the goal is an even distribution, but it seems that a
value that might represent silent shouldn't be labeled as 'soft', i
think 10, 30, 50, 70, 90 would be better, either including 'none' and
'loudest' for 0/100 or just leaving those values to be written out by
hand.

I think that you should probably include the explanation you included
in <non-negative number> about designing for compatibility with SSML.

> <non-negative number>
>    An integer or floating point positive number in the range ‘0’ to ‘100’.

It seems better to call this a <something-percentage>. I don't think
defining non negative to be bounded above by 100 makes sense.

Of note, you use 'non-negative' here.

> When the ‘linear’ keyword not used

s/not/is not/

Could you please do something to the style so that two normal <p>'s
when placed adjacent to each-other have margins? your primary audience
might be css3-speech users, but..

> All 3 values are configured by the user

s/configured/potentially configurable/

> so this allows authors to write a single style sheet that works in a variety of listening environments.

s/so//

> because it is independent from the user-configured volume levels.

? s/from/of/
-- I'm not sure on this point, my suggestion is because to me you're
saying that while they could be mathematically related, they aren't
(thus "of").

I think "not directly related to" is probably a better solution

> (where ‘x-soft’ always means "silent", etc.).

drop ", etc." ?

> <percentage>
>    Only positive percentage values are allowed.

I think you want 'non-negative' not 'positive', as '0' is allowed.

> so the computed value equals the inherited value times 0.5 (divided by 2),

s/divided/i.e. divided/

> (the volume corresponding to ‘0’ is nearer the value of ‘100’)
> (the gap between ‘0’ and ‘100’ is wider).

i don't think 'nearer' / 'wider' are good choices for this description

> normal
> Punctuation is not to be spoken, but instead rendered naturally as various pauses.

shouldn't punctuation also affect tone, volume, stress, etc.?

> <time>
>  Only positive values are allowed.

s/positive/non-negative/ ?

> none
> Equivalent to 0ms (no prosodic break in the speech output).

> The ‘cue-before’ and ‘cue-after’ properties specify auditory icons (i.e. prerecorded audio clips) to be played before (or after) the selected element within the audio "box" model. When a user agent is not able to render the specified auditory icon, it is recommended to produce an alternative cue (e.g., popping up a warning, emitting a warning sound, etc.)

You're missing a period at the end of this paragraph

>     The URI must designate an auditory icon resource. If the URI resolves to something other than an audio file, such as an image, the resource is ignored and the property treated as if it had the value ‘none’.

must sounds like an rfc term, which is probably not proper in this context.

> The loudness of prerecorded audio cues can be adjusted relatively to the volume level of synthetic speech.

s/relatively/relative/

synthetic or synthesized?
(possibly "speech synthesis")

> Only positive percentage values are allowed.

non-negative?

> The ‘voice-family’ property specifies a comma-separated, prioritized list of values that designate speech synthesis voices.

s/voices./voices/ -- otherwise you have a random stray period after
the parenthetical:

> (analog to ‘font-family’ in visual style sheets).

s/analog/analogous/

> <name>
> For compatibility with SSML, whitespace characters are not permitted within voice names.

This should probably be listed earlier in the paragraph. And it's
probably better as "voice names must not contain whitespace
characters".

> <age>
>    Possible values are ‘child’, ‘young’ and ‘old’.

to me, 'age' is numeric, i'd suggest you use some other thing to
describe the textual concepts. you're also missing something for
'normal'.

> Possible values are positive numbers restricted to integers, and excluding zero (i.e. starting from 1).

This is rather convoluted. You defined Positive numbers to include 0
reference that definition and then actively exclude zero.

> (e.g. name, gender, age, etc.).

drop "etc."

> in order to cater for dialectic variants): .

s/for/to/
s/: ./:/

> If no voice is available for the language of the selected content, user-agent should raise a warning to let the user know about the lack of appropriate TTS voice.

While this is a should instead of a must, I'm not certain it's a
wonderful suggestion. UI design via specification especially in the
area of warnings is generally poor. I'd suggest 'may'.

> The speech synthesizer voice must be re-evaluated (i.e. the selection process must take place once again) whenever either of the CSS voice characteristics change within the content flow.

s/either/any/

I'm concerned by 're-evaluated' + 'when*' -- This document talks about
a single directed flow, and I'd want UAs to have the option of
applying the selection process at "layout" instead of at "rendering".
Otherwise you risk asking a UA to compute something while it's
reading, creating an unexpectedly long pause between potential voice
transitions.

> The voice must also be re-calculated whenever the content language changes, unless the ‘preserve’ keyword is used

It'd be nice if a css selector based example was provided instead of a
forced rule on the node.

>   The french text below will be spoken with an english voice:

s/french/French/; s/english/English speaker's/

> 8.3. The ‘voice-pitch’ property
> Value:  <frequency> | <percentage> | <relative-change> | x-low | low | medium | high | x-high | inherit
> <relative-change>
>    Specifies a relative change (decrement or increment) to the inherited value. The syntax of allowed values is a <number> (the "+" sign is optional for positive numbers), followed by either of "Hz" (for Hertz) or "kHz" (for kiloHertz) or "st" (for semitones), and followed by a space character and the "relative" keyword.

It seems like:

| <relative-value> relative |

would be much easier to understand than an extra sentence hidden at
the end of the text.
Received on Wednesday, 18 May 2011 10:34:28 UTC