Re: Checkpoint 4.13 (speech parameters): Seeking clarification on requirements

At 2001-02-23 17:53, Ian Jacobs wrote:
>Hello,
>
>Checkpoint 4.13 of the 26 Jan 2001 Guidelines [1] and the
>note that follows reads:
>
>     4.13 Allow the user to configure synthesized voice
>          gender, pitch, pitch range, stress, richness,
>          speech dictionary, and handling of spelling,
>          punctuation, and number processing according
>          to the full range of values offered by the speech
>          synthesizer. [Priority 2]

I'd previously pointed out that gender is a surrogate for
fundamental frequency of the voice pitch. The richness of a
voice of a particular fundamental frequency may be affected by
the size of the voicing mechanism, so a child's voice may be
distinguishable from an adult's voice of the same fundamental
frequency.

[Aside: I'd like recommendations on software TTS that allows me
independent configuration among those parameters. I want to become
a test case!]

>       Note: Many speech synthesizers allow users to choose
>       from among preset options that control different voice
>       parameters (gender, pitch range, stress, richness, etc.)
>       as a group. When using these synthesizers, allow the user to
>       choose from among the full range of preset options (e.g.,
>       "adult male voice", "female child voice", "robot voice",
>       etc.). Ranges of values for these characteristics may vary
>       among speech synthesizers.
>
>This checkpoint involves three parts:
>
>   a) A conforming user agent for "Speech" must implement
>      these 9 parameters.

The note above clarifies that these are often unavailable to the user,
so should not need to all be under user control.
I think that is requirement to handle all 9 parameters is too extreme.

>   b) A conforming user agent agent for Speech must
>      allow configuration according to the full range
>      of values of the speech synthesizer

Which may well differ from the set of 9.

[Aside: I'd like to be able to allow my effective audiogram (possibly
after correction by hearing aid) to affect the generation, possible
augmentation, and speedup of the delivery of phonemes. Some phonemes
can be augmented to reduce their high-frequency components. For example,
"breadth" could become "bareadatha", adapting the technique that the
conductor Robert Shaw used in training choral singers.]

>   c) The Note says that it's ok for a speech synthesizer
>      to allow configuration of these parameters as a group
>     (namely through preset voices).
>
>Points (b) and (c) seem to be incompatible.

I agree.

>The checkpoint
>suggests that the user must be able to configure each parameter
>independently and fully, but the note seems to override that
>requirement. The Note suggests that access to the full range
>of values of each parameter is required in the case where
>the user interface offers the user preset voices.
>
>Here's the question: If full configuration of all parameters a P2
>requirement, or is some configuration (e.g., through preset
>voices) a P2 requirement and full configuration of all parameters
>a P3 requirement?

Yes. For my hearing, selection of a male voice is more understandable
than a child's or female voice. I'm not sufficiently familiar with various
TTS generation systems to know the effects of grouped vs individual
controls of the 9 parameters.


>I don't have a proposal for addressing this, but I think
>that 4.17 needs to be clearer. Can the UA satisfy the
>checkpoint by providing limited access (through preset voices)
>to the engine's full capabilities?

Regards/Harvey Bingham

Received on Monday, 26 February 2001 15:59:36 UTC