Re: Checkpoint 4.13 (speech parameters): Seeking clarification on requirements

I think the main issue here is that the user gain as much control over the 
synthesizer that is possible.  Since synthesizer technology use many 
different standards and APIs to control and generate speech it is difficult 
to define an exact set of elements that translate well across many 
different operating systems and computing hardware.  I think in general we 
want to have as much control as the speech API allows, but flexibility to 
make sure that the technology is readily available to developers.  I have 
noticed that probably the most popular speech API, Microsoft SAPI, is now 
starting supporting W3C specs related to Voice technologies and Aural Style 
Sheets.

The minimum set that I think the group can be confident of are:
1. Average pitch
2. Gender
3. Punctuation and spelling
4. Voice labels related to age and accent (kid, adult, older, robotic 
).  These encompass preset values that guarantee stability for voice 
characteristics such as richness and pitch range.

I think we need to revisit the Voice browser specifications and Aural CSS.

Jon


At 05:53 PM 2/23/2001 -0500, you wrote:
>Hello,
>
>Checkpoint 4.13 of the 26 Jan 2001 Guidelines [1] and the
>note that follows reads:
>
>     4.13 Allow the user to configure synthesized voice
>          gender, pitch, pitch range, stress, richness,
>          speech dictionary, and handling of spelling,
>          punctuation, and number processing according
>          to the full range of values offered by the speech
>          synthesizer. [Priority 2]
>
>       Note: Many speech synthesizers allow users to choose
>       from among preset options that control different voice
>       parameters (gender, pitch range, stress, richness, etc.)
>       as a group. When using these synthesizers, allow the user to
>       choose from among the full range of preset options (e.g.,
>       "adult male voice", "female child voice", "robot voice",
>       etc.). Ranges of values for these characteristics may vary
>       among speech synthesizers.
>
>This checkpoint involves three parts:
>
>   a) A conforming user agent for "Speech" must implement
>      these 9 parameters.
>   b) A conforming user agent agent for Speech must
>      allow configuration according to the full range
>      of values of the speech synthesizer
>   c) The Note says that it's ok for a speech synthesizer
>      to allow configuration of these parameters as a group
>     (namely through preset voices).
>
>Points (b) and (c) seem to be incompatible. The checkpoint
>suggests that the user must be able to configure each parameter
>independently and fully, but the note seems to override that
>requirement. The Note suggests that access to the full range
>of values of each parameter is required in the case where
>the user interface offers the user preset voices.
>
>Here's the question: If full configuration of all parameters a P2
>requirement, or is some configuration (e.g., through preset
>voices) a P2 requirement and full configuration of all parameters
>a P3 requirement?
>
>I don't have a proposal for addressing this, but I think
>that 4.17 needs to be clearer. Can the UA satisfy the
>checkpoint by providing limited access (through preset voices)
>to the engine's full capabilities?
>
>  - Ian
>
>[1] http://www.w3.org/WAI/UA/WD-UAAG10-20010126/
>--
>Ian Jacobs (jacobs@w3.org)   http://www.w3.org/People/Jacobs
>Tel:                         +1 831 457-2842
>Cell:                        +1 917 450-8783

Jon Gunderson, Ph.D., ATP
Coordinator of Assistive Communication and Information Technology
Division of Rehabilitation - Education Services
MC-574
College of Applied Life Studies
University of Illinois at Urbana/Champaign
1207 S. Oak Street, Champaign, IL  61820

Voice: (217) 244-5870
Fax: (217) 333-0248

E-mail: jongund@uiuc.edu

WWW: http://www.staff.uiuc.edu/~jongund
WWW: http://www.w3.org/wai/ua

Received on Monday, 26 February 2001 10:16:30 UTC