ACTION-268 - Craft request for input on synthesized speech inclusion in the document from Jim Allan on 2010-02-03 (w3c-wai-ua@w3.org from January to March 2010)

From: Jim Allan <jimallan@tsbvi.edu>
Date: Wed, 3 Feb 2010 12:35:06 -0600
To: "'UAWG list'" <w3c-wai-ua@w3.org>
Message-ID: <007e01caa4ff$9bba8b30$d32fa190$@edu>
My thoughts and musings...

Reviewed the comment at
http://lists.w3.org/Archives/Public/public-uaag2-comments/2009Sep/0000.html.
Text is restructured below to present meaningful topics for discussion.

The original comment was a response to a request for comments on our working
draft.

"Are the synthesized speech configuration success criteria in Guideline 3.8
clear and provide adequate instruction to user agent developers?"

Respondents proposed the following additions to UAAG: 
1. the ability for the User Agent to switch the speech synthesizer language
: - automatically, based on the lang attribute in the content being read -
or manually, by providing controls in the User Agent ( as opposed to
externally, to the OS speech synthesizer) 

Response: Switching speech synthesizer language base on the @lang is current
behavior for commercially available screen readers. Switching occurs in the
assistive technology. Where the screen reader is part of the OS (i.e. Voice
Over on the Macintosh), the language switch occurs in Voice Over. Testing
with a transcoding site [1] (a site that transforms the current webpage or
browsing session to meet the needs of the user without the use of assistive
technology) language switching did not occur. Transcoding sites are still
relatively new and not fully formed. 

Switching language is not currently in UAAG2. It should be added to GL 3.8

<proposed>
3.8.a The user can set the default language of the speech synthesizer. (A)
3.8.b The speech synthesizer must switch languages as appropriate when
encountering an author indication that content being read is a different
language. (A)
</proposed>

The above is a bit wordy. Was trying to stay away from HTML specific @lang.
there is also the problem of the speech synthesizer only having one language
and not being able to switch, as is the case with VoiceOver; which only
comes with English built-in. Also, the Mac OS does not auto switch to third
party voices (other languages installed by the user).

Also found 
5.3.x Appropriate Language. If characteristics of your user agent involve
producing an end user experience such as speech, you need to react
appropriately to language changes.
This seems to be misplaced. Would be a good substitute for 3.8.a

2. the ability to change the synthesizer voice (when a choice of voices is
available)

Response: currently voice switching happens in the screen reader. The user
can select the default voice for all text spoken. Additionally, voice
changes can occur dynamically depending on the type of element or attribute
on the webpage (i.e. different voice or pitch for heading levels, bold,
links, etc.). In obsolete self-voicing browsers (i.e. pwWebSpeak, IBM Home
Page Reader) this feature was also available. 

This is covered by 3.8.2 and 3.8.3

3. the reading mode (words, spelling) that is used by the User Agent in
different places. The User Agent can setup specific triggers (class, id) to
switch between various voices and modes of speech. 

Response: This is partially covered by 3.8.3 and 3.8.5. 
On considering...many of these seem to be screen reader behaviours, not
synthesizer behaviours. The screen reader (or self-voicing browser) tells
the synthesizer what to say. User move caret by character, a character is
spoken. User moves caret by word, the word is spoken. The same is true for
line, paragraph, etc. The screen reader sends the appropriate information to
the synthesizer for sound production. It is also the screen-reader/browser
that determine the voicing of <abbr>, <acronym>, etc. based on user setting.
Changing speech characteristics based on class, id, style attribute, etc.
seems more of a task for an author to write specific speech behaviours for
these attributes. The problem is one of scale...there are a finite number of
elements with which to trigger different speech behaviours. The number of
@id, @class, are infinite. 

Additional comments: 
4. It seems that the current UAAG guidelines are not simple to implement in
User Agents with existing speech synthesizers APIs. It might help to have a
technical review (state of the art) of existing Speech Synthesizers APIs and
use these as a practical basis for implementations in User Agents (what
fields and controls should be exposed). 

Response: The guidelines in the current document were developed with input
from developers familiar with speech synthesis API. The working group has
solicited review of the document by screen reader developers and
manufacturers. 
However, we have discussed problems with exception (or acronym an
abbreviation expansion) dictionaries. Synthesizers have these dictionaries,
they are unique to the synthesizer. The documentation of these dictionaries
is not always readily available. Superimposed on the synthesizer is the
screen reader exception dictionaries (also unique). If the user agent
imposes control (on/off or other exceptions) the permutations of what the
average user would have to configure is a bit daunting. 

5. Often, one of the missing features in speech synthesizers implementations
is the ability to query the state and progress of the speech being
synthesized. The User Agent could fulfill this role.
Querying what is the speech synthesizer processing with regards to the
UA yields important feedback for the user such as:         
"the speech synthesizer is currently reading div id="sasl" paragraph 5 word
number 10, there are still 500 words to read, it has been reading for 15
seconds and ETA is 2 min, etc ..."

Response: again, this seem more of a screen reader behavior. Commercially
available screen readers have a where-am-I function to give current speech
caret position on the page. They do not AFIK present the more elaborate @id,
time to end of reading based on reading rate, etc. Seems beyond AAA.
Recommend non-inclusion in UAAG20

References:
1. http://webanywhere.cs.washington.edu/wa.php 

Jim Allan, Accessibility Coordinator & Webmaster
Texas School for the Blind and Visually Impaired
1100 W. 45th St., Austin, Texas 78756
voice 512.206.9315    fax: 512.206.9264  http://www.tsbvi.edu/
"We shape our tools and thereafter our tools shape us." McLuhan, 1964
Received on Wednesday, 3 February 2010 18:35:54 UTC