Re: Proposed minimal requirements for audio/speech checkpoints.

At 2000-06-12 15:13-0400, Ian Jacobs wrote:
>Hello,
>
>I propose that minimal requirements for the following checkpoints
>from the 10 June draft [1] be established on the basis of
>property values in the CSS2 Recommendation [2]:
>
>   4.8 Allow the user to configure and control the audio volume.
>   4.9 Allow the user to configure and control synthesized speech
>       playback rate.
>   4.11 Allow the user to configure synthesized speech pitch, gender,
>        and other articulation characteristics.
>
>The relevant CSS2 properties are 'volume' (in CSS, for speech
>only, but we can generalize its values here) and the voice
>characteristics properties of section 19.8 [3]:
>
>This is not a requirement for user agents to implement CSS,
>but to allow the same range of abstract values as specified in CSS.
>CSS also allows numbers and percentages, but I don't want to
>make those requirements.
>
>For instance, for volume, the following range would be mapped to
>six "real" levels by the user agent: silent, x-soft, soft,
>medium, loud, x-loud. Similarly, for speech-rate: x-slow,
>slow, medium, fast, x-fast. The relative rates "faster" and
>"slower" are relative values specific to CSS inheritance,
>so would not be required.

I appreciate your finding the way to include speech rate speed-up,
by any means, in spite of your recently expressed feeling that that
is a change of scope, so could not be included.

By analogy to how Netscape Navigator allows font "larger  Ctrl-]",
and "smaller  Ctrl-[" that I find much more useful than the five font size
choices that Microsoft IE5 allows, I would encourage allowing the
"faster" and "slower" relative values be made available for the user.

I also note that the speed changes should be done without pitch shift,
using the technique of stretching or shrinking silences between words,
and generally the vowel sounds within words.

>I will write out the specific values for checkpoints 4.8, 4.9,
>and 4.11, but for now I want to get feedback as to whether
>people think that this is a reasonable approach.
>
>IMPORTANT: I propose that we delete "and other articulation
>characteristics" from checkpoint 4.11 since that makes it
>much harder to specify minimal requirements.

4.11 That leaves pitch and gender. I question that they are independent.
Most are unable to distinguish a countertenor from a soprano, or a
tenor from a low alto. I am uncertain which of the other articulation
characteristics help to make such distinctions, so agree that they
are hard to specify minimal requirements. I'd go so far as to assert
that only pitch is appropriate.

I refer to my other comment on SMIL use for narrated speech, for which
speech speedup or slowdown (without pitch shift) is appropriate.

Regards/Harvey Bingham

Received on Tuesday, 13 June 2000 14:26:34 UTC