Re: ACTION-268 - Craft request for input on synthesized speech inclusion in the document from Greg Lowney on 2010-02-04 (w3c-wai-ua@w3.org from January to March 2010)

From: Greg Lowney <gcl-0039@access-research.org>
Date: Wed, 03 Feb 2010 21:01:40 -0800
To: WAI-UA list <w3c-wai-ua@w3.org>
Message-ID: <4B6A54B4.5030604@access-research.org>
Some thoughts no David's and Jim's suggestions re ACTION-268:

0. I agree with David that if we're address speech output we should also address refreshable braille output.

1.1. A problem with the proposed wording of 3.8.b is that it fails to exempt UA when the synthesizer does not support the specified language. This is especially problematic if it's Level A. It needs to be modified to address that.

1.2 A second problem with the proposed 3.8.b is that, since it gives author-specified language ultimate authority, any content which incorrectly identifies its language would be inaccessible to the speech-output user. As with the other SC in 3.8, the user should have the ability to override any author-specified settings, including language. 

1.3 Jim mentions that 5.3.x might be misplaced. To me the meaning of that SC is entirely unclear from its current wording. Does anyone else know what it's trying to say?

2. Re the suggestion that the user can change speech synthesizer voices, I disagree with the proposed response saying this is the responsibility of the screen reader, as all of Guideline 3.8 is geared towards user agents that self-voice. While self-voicing browsers may be out of style on the PC, they are certainly still the norm for telephone access to Web and Mail. If we think it important that the user of self-voicing UA can change speed and pitch (3.8.1) it makes sense that they can change other speech synthesizer attributes such as which voice profile is being used (e.g. Whispering Wendy vs. Doctor Dennis). 

3.1 As Jim points out, it's completely reasonable for the self-voicing user agent to read a word at a time when the user navigates by words, a letter at a time when they navigate by characters, etc. However, when the user asks the browser to read a passage separate from navigation, it makes sense for them to be able to specify whether they want it read as letters, words, sentences, etc. 

3.2 The suggestion that the user can program the UA with specific reading behaviors for any attributes is asking a lot, and probably too much. (I have to assume, too, that the original commenter meant that to be associated with more than just character vs. word reading mode.) Setting that up would take a lot of work for each site. One could do this using the Greasemonkey extension to Firefox, but I don't know that it would be implemented widely enough to warrant making it a low-priority SC, and it's certainly too hard to make high-priority.

5. I don't understand how the UA could fulfill the user's suggestion without changes to the underlying speech synthesizers. However, I disagree with the response that says it's only an issue for assistive technology, since any such issues would equally apply to self-voicing user agents.

	Thanks,
	Greg

-------- Original Message  --------
Subject: Re: ACTION-268 - Craft request for input on synthesized speech inclusion in the document
From: David Poehlman <poehlman1@comcast.net>
To: jimallan@tsbvi.edu
Cc: "'UAWG list'" <w3c-wai-ua@w3.org>
Date: 2/3/2010 10:53 AM

I think if we are going to do up speech, we also need to do up refreshable braille.  For instance, it is possible that language may not be expressed in speech, but expressed in braille.  Often, even if the screen reader changes the speech language, braille stays the same as far as I know.

Lastly, it is deffinitely the at that decides how to handle language changes.

The at in the case of voiceover is built into the ui and not part of the ua.

On Feb 3, 2010, at 1:35 PM, Jim Allan wrote:

My thoughts and musings...

Reviewed the comment at
http://lists.w3.org/Archives/Public/public-uaag2-comments/2009Sep/0000.html.
Text is restructured below to present meaningful topics for discussion.

The original comment was a response to a request for comments on our working
draft.

"Are the synthesized speech configuration success criteria in Guideline 3.8
clear and provide adequate instruction to user agent developers?"

Respondents proposed the following additions to UAAG: 
1. the ability for the User Agent to switch the speech synthesizer language
: - automatically, based on the lang attribute in the content being read -
or manually, by providing controls in the User Agent ( as opposed to
externally, to the OS speech synthesizer) 

Response: Switching speech synthesizer language base on the @lang is current
behavior for commercially available screen readers. Switching occurs in the
assistive technology. Where the screen reader is part of the OS (i.e. Voice
Over on the Macintosh), the language switch occurs in Voice Over. Testing
with a transcoding site [1] (a site that transforms the current webpage or
browsing session to meet the needs of the user without the use of assistive
technology) language switching did not occur. Transcoding sites are still
relatively new and not fully formed. 

Switching language is not currently in UAAG2. It should be added to GL 3.8

<proposed>
3.8.a The user can set the default language of the speech synthesizer. (A)
3.8.b The speech synthesizer must switch languages as appropriate when
encountering an author indication that content being read is a different
language. (A)
</proposed>

The above is a bit wordy. Was trying to stay away from HTML specific @lang.
there is also the problem of the speech synthesizer only having one language
and not being able to switch, as is the case with VoiceOver; which only
comes with English built-in. Also, the Mac OS does not auto switch to third
party voices (other languages installed by the user).

Also found 
5.3.x Appropriate Language. If characteristics of your user agent involve
producing an end user experience such as speech, you need to react
appropriately to language changes.
This seems to be misplaced. Would be a good substitute for 3.8.a

2. the ability to change the synthesizer voice (when a choice of voices is
available)

Response: currently voice switching happens in the screen reader. The user
can select the default voice for all text spoken. Additionally, voice
changes can occur dynamically depending on the type of element or attribute
on the webpage (i.e. different voice or pitch for heading levels, bold,
links, etc.). In obsolete self-voicing browsers (i.e. pwWebSpeak, IBM Home
Page Reader) this feature was also available. 

This is covered by 3.8.2 and 3.8.3

3. the reading mode (words, spelling) that is used by the User Agent in
different places. The User Agent can setup specific triggers (class, id) to
switch between various voices and modes of speech. 

Response: This is partially covered by 3.8.3 and 3.8.5. 
On considering...many of these seem to be screen reader behaviours, not
synthesizer behaviours. The screen reader (or self-voicing browser) tells
the synthesizer what to say. User move caret by character, a character is
spoken. User moves caret by word, the word is spoken. The same is true for
line, paragraph, etc. The screen reader sends the appropriate information to
the synthesizer for sound production. It is also the screen-reader/browser
that determine the voicing of <abbr>, <acronym>, etc. based on user setting.
Changing speech characteristics based on class, id, style attribute, etc.
seems more of a task for an author to write specific speech behaviours for
these attributes. The problem is one of scale...there are a finite number of
elements with which to trigger different speech behaviours. The number of
@id, @class, are infinite. 

Additional comments: 
4. It seems that the current UAAG guidelines are not simple to implement in
User Agents with existing speech synthesizers APIs. It might help to have a
technical review (state of the art) of existing Speech Synthesizers APIs and
use these as a practical basis for implementations in User Agents (what
fields and controls should be exposed). 

Response: The guidelines in the current document were developed with input
from developers familiar with speech synthesis API. The working group has
solicited review of the document by screen reader developers and
manufacturers. 
However, we have discussed problems with exception (or acronym an
abbreviation expansion) dictionaries. Synthesizers have these dictionaries,
they are unique to the synthesizer. The documentation of these dictionaries
is not always readily available. Superimposed on the synthesizer is the
screen reader exception dictionaries (also unique). If the user agent
imposes control (on/off or other exceptions) the permutations of what the
average user would have to configure is a bit daunting. 

5. Often, one of the missing features in speech synthesizers implementations
is the ability to query the state and progress of the speech being
synthesized. The User Agent could fulfill this role.
Querying what is the speech synthesizer processing with regards to the
UA yields important feedback for the user such as:         
"the speech synthesizer is currently reading div id="sasl" paragraph 5 word
number 10, there are still 500 words to read, it has been reading for 15
seconds and ETA is 2 min, etc ..."

Response: again, this seem more of a screen reader behavior. Commercially
available screen readers have a where-am-I function to give current speech
caret position on the page. They do not AFIK present the more elaborate @id,
time to end of reading based on reading rate, etc. Seems beyond AAA.
Recommend non-inclusion in UAAG20

References:
1. http://webanywhere.cs.washington.edu/wa.php 

Jim Allan, Accessibility Coordinator & Webmaster
Texas School for the Blind and Visually Impaired
1100 W. 45th St., Austin, Texas 78756
voice 512.206.9315    fax: 512.206.9264  http://www.tsbvi.edu/
"We shape our tools and thereafter our tools shape us." McLuhan, 1964
Received on Thursday, 4 February 2010 05:03:45 UTC