- From: Jon Gunderson <jongund@uiuc.edu>
- Date: Mon, 26 Feb 2001 09:18:55 -0600
- To: Ian Jacobs <ij@w3.org>
- Cc: w3c-wai-ua@w3.org
I think the main issue here is that the user gain as much control over the synthesizer that is possible. Since synthesizer technology use many different standards and APIs to control and generate speech it is difficult to define an exact set of elements that translate well across many different operating systems and computing hardware. I think in general we want to have as much control as the speech API allows, but flexibility to make sure that the technology is readily available to developers. I have noticed that probably the most popular speech API, Microsoft SAPI, is now starting supporting W3C specs related to Voice technologies and Aural Style Sheets. The minimum set that I think the group can be confident of are: 1. Average pitch 2. Gender 3. Punctuation and spelling 4. Voice labels related to age and accent (kid, adult, older, robotic ). These encompass preset values that guarantee stability for voice characteristics such as richness and pitch range. I think we need to revisit the Voice browser specifications and Aural CSS. Jon At 05:53 PM 2/23/2001 -0500, you wrote: >Hello, > >Checkpoint 4.13 of the 26 Jan 2001 Guidelines [1] and the >note that follows reads: > > 4.13 Allow the user to configure synthesized voice > gender, pitch, pitch range, stress, richness, > speech dictionary, and handling of spelling, > punctuation, and number processing according > to the full range of values offered by the speech > synthesizer. [Priority 2] > > Note: Many speech synthesizers allow users to choose > from among preset options that control different voice > parameters (gender, pitch range, stress, richness, etc.) > as a group. When using these synthesizers, allow the user to > choose from among the full range of preset options (e.g., > "adult male voice", "female child voice", "robot voice", > etc.). Ranges of values for these characteristics may vary > among speech synthesizers. > >This checkpoint involves three parts: > > a) A conforming user agent for "Speech" must implement > these 9 parameters. > b) A conforming user agent agent for Speech must > allow configuration according to the full range > of values of the speech synthesizer > c) The Note says that it's ok for a speech synthesizer > to allow configuration of these parameters as a group > (namely through preset voices). > >Points (b) and (c) seem to be incompatible. The checkpoint >suggests that the user must be able to configure each parameter >independently and fully, but the note seems to override that >requirement. The Note suggests that access to the full range >of values of each parameter is required in the case where >the user interface offers the user preset voices. > >Here's the question: If full configuration of all parameters a P2 >requirement, or is some configuration (e.g., through preset >voices) a P2 requirement and full configuration of all parameters >a P3 requirement? > >I don't have a proposal for addressing this, but I think >that 4.17 needs to be clearer. Can the UA satisfy the >checkpoint by providing limited access (through preset voices) >to the engine's full capabilities? > > - Ian > >[1] http://www.w3.org/WAI/UA/WD-UAAG10-20010126/ >-- >Ian Jacobs (jacobs@w3.org) http://www.w3.org/People/Jacobs >Tel: +1 831 457-2842 >Cell: +1 917 450-8783 Jon Gunderson, Ph.D., ATP Coordinator of Assistive Communication and Information Technology Division of Rehabilitation - Education Services MC-574 College of Applied Life Studies University of Illinois at Urbana/Champaign 1207 S. Oak Street, Champaign, IL 61820 Voice: (217) 244-5870 Fax: (217) 333-0248 E-mail: jongund@uiuc.edu WWW: http://www.staff.uiuc.edu/~jongund WWW: http://www.w3.org/wai/ua
Received on Monday, 26 February 2001 10:16:30 UTC