[css3-speech] Editorial Comments from fantasai on 2011-04-28 (www-style@w3.org from April 2011)

From: fantasai <fantasai.lists@inkedblade.net>
Date: Thu, 28 Apr 2011 00:49:45 -0700
To: "www-style@w3.org" <www-style@w3.org>
Message-ID: <4DB91C19.9010404@inkedblade.net>

Overall, this module is missing the "Computed value" line from
all of the property definitions. That needs to be fixed.

1. Dependencies

I suggest removing this section. The references section is good
enough for this purpose.

What's missing is a discussion of how speech connects with the
CSS2.1 spec to create a definition of aural CSS rendering.

2. Introduction

s/may be used/can be used/

s/When using voice properties, the canvas/The aural canvas/

s/temporal space (you can/temporal space. For example, you can/

s/The CSS properties/CSS properties/

The statement about the 'aural' media type being deprecated
should be a separate note, if it's needed at all.

3. The aural "box" model

The note about speakability and display: none should be moved
to the section on speakability. It is totally out-of-context
here.

4. voice-balance

+100 does not need to be called out separately from 100. This
is all handled at the syntactic level; you don't need to address
it here. (If you want to discuss 100 vs +100, then you might as
well also discuss 100 vs 0100, the comparison of which operates
at the same level.)

# Many speech synthesizers only support a single channel. The
# ‘voice-balance’ property can then be treated as part of a
# post synthesis mixing step. This is where speech is mixed
# with other audio sources.

I think this point could use some clarification, or maybe an
example.

6. Pause

# The synthesis processor may insert a rest as part of its implementation
# of the prosodic break.

This sentence seems weird and potentially confusing. The sentence before
it is poorly worded as well. I suggest replacing with

| Expresses the pause by the strength of the prosodic break in speech
| output. The exact time is implementation-dependent.

Probably 'none' should be called out in a separate definition and defined
as equal to 0ms.

# and can be used to inhibit a prosodic break which the processor
# would otherwise produce

I suggest removing this phrase since it implies that prosodic
breaks introduced by punctuation here might also be removed.
I don't think that's the intention.

What might be useful is some discussion of the UA style sheet
and how the author can override, e.g. the breaks between
paragraphs by specifying
p { pause: none; }

# "x-weak" and "x-strong" are mnemonics for "extra weak" and
# "extra strong", respectively.

If this note needs to be kept, it should be in a class="note".
(I don't think it's really necessary to mention, though.)

# The stronger boundaries are typically accompanied by pauses.
# The breaks between paragraphs are typically much stronger than
# the breaks between words within a sentence.

This is UA stylesheet advice, and does not belong in the definition
of the values.

6.1 collapsing pauses

s/Adjacent/Adjoining/ to be consistent with the collapsing terminology.

s/should be merged/are merged/ (this is not merely a recommendation)

The "combination of a named break and time duration" sentence is placed
awkwardly... Maybe merge it in like this:

| Adjoining pauses are merged by selecting the strongest named break and
| the longest absolute time interval. Thus "strong" is selected when
| comparing "strong" and "weak", "1s" is selected when comparing "1s"
| and "250ms", and "strong" and "250ms" take effect additively when
| comparing "strong" and "250ms".

s/collapse:/are adjoining:/ seems like a good idea...

Also toss in

| A collapsed pause is considered adjoining to another pause if any
| of its component pauses is adjoining to that pause.

(Taken from CSS2.1 8.3.1 Collapsing margins.)

# if the the "box" has a ‘voice-duration’ of "0ms" ... and no content.

I think what's intended here is a voice-duration of 0ms *or* no content.
No? Also, s/no content/no rendered content/, since it may have content
hidden by display: none.

The sentences about pauses being adjoining seem redundant with the
sentences about pauses collapsing. Probably the latter should be
removed?

7. Rests

See comments for 6. Pauses

s/additively/additively and do not collapse/ (just to be extra clear)

9. voice-pitch-range

# and typically has a value of 120Hz for a male voice and 210Hz for a
# female voice

This is already mentioned under voice-pitch, and really belongs there
and not here, so I'd cut it out.

# one semitone is approximately 1.05946 (the actual arithmetics involved
# are beyond the scope of this specification, please refer to existing
# literature on that subject).

Question from the minutes <http://lists.w3.org/Archives/Public/www-style/2011Feb/0029.html>:

<dbaron> Is it possible to replace "(the actual arithmetics involved are
beyond the scope of this specification, please refer to existing
literature on that subject)" with "(the twelfth root of two)"? :-)

9. voice-stress

# For example, when the phrase "going to" is reduced it may be spoken
# as "gonna".

This example is probably better attached to the "Emphasis is indicated
using a combination ... that varies from one language to the next."
But I'm dubious about this example. An articulate person could de-stress
"going to" without reducing it to "gonna", no? This seems more like a
dialectical difference than a stress difference.

~fantasai

Received on Thursday, 28 April 2011 07:50:16 UTC