[css3-speech] Editorial Comments

1.1. Design goals, motivations

Great intro. :)

1.2. Relationship with CSS2.1

   # Content creators can conditionally include [...]

I don't think this paragraph quite belongs under this section.
Maybe in the previous section, or in its own section, or after
the 1.3 CSS Speech Example section, I'm not sure. But it's not
really about the relationship of CSS Speech and CSS2.1. :)

2. The aural "box" model

Shift the <dfn> from the heading to the end of the first sentence.

Also, I would move the sentence from the intro that defines the
aural canvas to this section (and throw in another <dfn> for that).
This way the entire introduction is informative, and this definition,
which is important for understanding the aural rendering model, is
together with the rest of the definition of the model.

And then rename this section to "The aural formatting model", to
parallel the "visual formatting model" in CSS2.1.

3.1. The ‘voice-volume’ property

  # <decibel>
  #
  #    An integer or floating point number immediately followed by
  #    "dB" (decibel unit).

s/An integer or floating point number/A <number>/

(We should get this moved into the CSS3 Values module, but since it's
not there yet, it's fine to leave here.)

4.1. The ‘speak’ property

  # Note that ‘display’ is the only property defined externally to this
  # CSS3 module that affects behavior within the aural "box" model.

This isn't actually true anymore, as list-style-type also has an effect.
I'd remove this sentence and instead add a reference to [[!CSS21]], which
defines the 'display' property, to the normative definition.

4.2. The ‘speak-as’ property

   # Note that the functionality provided by this property is related
   # to the say-as element from the SSML markup language [SSML]. Also
   # note that possible values are described in a W3C Note ([SSML-SAYAS])
   # separate from the SSML specification, whereas the CSS Speech module
   # explicitly defines a list of possible values.

I think we can collapse this note to just

   | Note that the functionality provided by this property is related
   | to the say-as element from the SSML markup language [SSML], whose
   | values are described in [SSML-SAYAS].


   # Uses language-dependent pronunciation rules for rendering an
   # element and its children.

It doesn't actually control the children, since they're controlled by
their own 'say-as' property, so this should be

   | Uses language-dependent pronunciation rules for rendering the
   | element's content.

   # literal-punctuation
   #    Similar to ‘normal’ value, but punctuation such as semicolons,
   #    braces, and so on are to be spoken literally.
   # no-punctuation
   #    Similar to ‘normal’ value but punctuation is not to be spoken
   #    nor rendered as various pauses.

Since these values can be combined with 'spell-out' and 'digits', which
would not be the same as 'normal', I suggest recasting the definitions
as


   | literal-punctuation
   |    Punctuation such as semicolons, braces, and so on is named aloud
   |    rather than rendered naturally as appropriate pauses.
   | no-punctuation
   |    Punctuation is not rendered: neither spoken nor rendered as pauses.


5.1. The ‘pause-before’ and ‘pause-after’ properties

   # <time>
   #    Expresses the pause in absolute time units (seconds and milliseconds,
   #    e.g. "+3s", "250ms") as per the syntax of time values defined in [CSS3VAL].

Drop the "as per... [CSS3VAL]" portion of the sentence. Instead copy
   http://dev.w3.org/csswg/css-module/#values
into the spec, replacing
   CSS Level 2 Revision 1 [CSS21]
with
   CSS Value and Units Level 3 [CSS3VAL]
if needed. That'll define all the value definition across the spec in one place.

5.3. Collapsing pauses

   # For example, "strong" is selected

Examples are class="example". :) Since it's just one sentence and marked with the
phrase "For example", you can also just leave it inline in the spec per
   http://dev.w3.org/csswg/css-module/#conventions

6.1. The ‘rest-before’ and ‘rest-after’ properties

Same comment wrt <time> and [CSS3VAL] as for 'pause-before' and 'pause-after'.

   # This value can be used to inhibit a prosodic break which the processor
   # would otherwise produce.

I think this sentence should be dropped. "none" should mean that there is no
rest, not that, e.g. the comma in

   This, <span>phrase</span>

is ignored.

8.1. The ‘voice-family’ property

   # Note that as a result, most punctuation characters, or digits at the start
   # of each token, must be escaped in unquoted voice names. For example, the
   # following declarations are invalid: [...]

I suggest moving the invalid example somewhere further down, since it is useful
breaks the flow of trying to understand the property.

Also, use
   <div class="example">
     <p>...
     <pre></pre>
   </div>
instead of
   <p class="note">
   <div class="example"><pre>...</pre></div>

   # Note that to avoid mistakes in escaping, it is recommended to quote voice
   # names that contain white space, digits, or punctuation characters other
   # than hyphens. For example: [...]

Again, I'd use

   <div class="example">
     <p>...
     <pre>...</pre>
   </div>

   # voice-family: "john doe", "Henry the-8th";

Given both of those are valid if unquoted, how about:

   voice-family: "Edward O'Connor", "Henry the 8th";

which are not? :)

I'll send a separate email on other voice-family issues...

8.1.1. Voice selection, content language

I'd rename the anchor "voice-selection", which avoids so many
abbreviations...

Item #4 in the list doesn't really belong in the list and should be
a paragraph after it.

8.2. The ‘voice-rate’ property

   # Note that a leading "+" sign does not denote an increment, for
   # example +50% is equivalent to 50%

I don't think we need this note. This is standard behavior in CSS. :)

   # Note that typical values are (in words per minute) x-slow = 80,
   # slow = 120, medium = between 180 and 200, fast = 500.

I assume this is for English? Might want to mention that. I imagine
the value would be different for, e.g. Chinese vs. Hawaiian.

8.3. The ‘voice-pitch’ property

   # <frequency>
   # Specifies the average pitch of the speaking voice using an absolute
   #  value in frequency units (Hertz and kiloHertz, e.g. "100Hz",
   # "+2kHz") as per the syntax of frequency values defined in [CSS3VAL].

Same comment as for 'pause-before' wrt "as per ... [CSS3VAL]".

   # Note that a leading "+" sign does not denote an increment.

If you really need this note about the plus sign, move it into the
comment about the pitch attribute in SSML, since in CSS this is
standard behavior, and the confusion only arises if you're expecting
SSML syntax.

   # For example, +50% is equivalent to 50%, so the computed value
   # equals the inherited value times 0.5 (i.e. divided by 2), which
   # is half the inherited average pitch of the voice.

Now that we've covered the leading plus elsewhere, just convert this
into a sentence tacked onto the definition:
   | ... Computed values are calculated relative to the inherited
   | value. For example, 50% equals the inherited value times 0.5,
   | which is half the inherited average pitch of the voice.

   # <relative-value>
   #   Specifies a relative change (decrement or increment) to the
   #   inherited value. The syntax of allowed values is a <number>,
   #   followed immediately by either of "Hz" (for Hertz) or "kHz"
   #   (for kiloHertz) or "st" (for semitones).
   # relative
   #   This keyword specifies that the provided value is expressed
   #   relatively to another base value. This is in order to
   #   disambiguate from absolute <frequency> values.

I would drop the Hz definition from <relative-value> and only use
semitones, and have the definition of the 'relative' keyword carry
the relativeness:

   voice-pitch: <frequency> && relative? | <relative-value> | <percentage>

   # relative
   #   This keyword specifies that the provided <frequency> is expressed
   #   as a relative change from the inherited value.

8.4. The ‘voice-pitch-range’ property

Same comment as above for 'voice-pitch'.

   # Note that a semitone is half of a tone (a half step) on the standard
   # diatonic scale. A semitone doesn't correspond to a fixed value in
   # Hertz: instead, the ratio between two consecutive frequencies separated
   # by exactly one semitone is approximately 1.05946 (the twelfth root of two).

This shouldn't be a note. It should be a definition somewhere. Maybe
your spec should have a Units section where it can define decibels
and semitones and anything else it needs that's not in 2.1.

Also, unless "the twelfth root of two" is an approximation, change
   approximately 1.05946 (the twelfth root of two)
to
   the twelfth root of two (approximately 1.05946)

9.1. The ‘voice-duration’ property

Same comment as 'pause' wrt "as per ... [CSS3VAL]".

10. List items and counters styles

   # the ‘list-style-type’ is used (if present).

Drop "(if present)". A value for 'list-style-type' is always present.
   http://www.w3.org/TR/CSS21/cascade.html#value-stages

   # Note that the working draft of the CSS Lists module [CSS3LIST]
   # contains new features which are not yet supported in this version
   # of the CSS Speech module. Support for these features will be added
   # later, when the CSS Lists draft stabilizes.

This is a very time-based note. Just say that the speech rendering of
new features from the CSS Lists and Counters Module Level 3 is not
covered in this level of CSS Speech, but may be defined in a future
specification. (Or remove the note.)

11. Pronunciation, phonemes

   # The W3C PLS (Pronunciation Lexicon Specification) recommendation
   # ([PRONUNCIATION-LEXICON]) is one potential format to use with the
   # "pronunciation" rel value, which allows importing pronunciation
   # lexicons in HTML documents using the link element (similarly to
   # how CSS stylesheets can be included).

I think this should be split into two sentences, maybe something like
this:

   | The "pronunciation" rel value allows importing pronunciation lexicons
   | in HTML documents using the link element (similar to how CSS stylesheets
   | can be included). The W3C PLS (Pronunciation Lexicon Specification)
   | [PRONUNCIATION-LEXICON] is one format that can be used to describe such
   | a lexicon.

Also, since this section's purpose is to explain a design decision, I'd
shift this section after 12. Inserted and replaced content so that it
can be removed at a future date without triggering a renumbering of other
sections.

12. Inserted and replaced content

This entire section should be marked non-normative, except for one thing:
the location of ::before and ::after wrt content and 'rest' needs to be
normative -- so put it in the section defining the aural box model.

~fantasai

Received on Thursday, 30 June 2011 02:35:49 UTC