Re: maximum & minimum speech rates for software synthesizers from Gregory J. Rosmaita on 2000-06-16 (w3c-wai-ua@w3.org from April to June 2000)

From: Gregory J. Rosmaita <unagi69@concentric.net>
Date: Fri, 16 Jun 2000 15:58:23 -0400
To: Ian Jacobs <ij@w3.org>
Cc: User Agent Guidelines Emailing List <w3c-wai-ua@w3.org>, Janina Sajka <janina@afb.net>, Peter Verhoeven <pav@oce.nl>, kerscher@mail.montana.com
Message-Id: <4.3.1.2.20000616151753.00d3a610@127.0.0.1>
at 05:56 PM June 15, 2000, ian wrote:

quote
I'm not comfortable with a range that high, since for 50% of the
tools sampled, that's the maximum rate. I think the minimum req
should be "somewhat less" than the max rate of some tools. But
I'm not a pro, so I don't feel too strongly about it.
unquote

aloha, ian!

while i respect your caution, i am aurally dependent, so i do feel 
extremely strongly about this issue -- especially if i am attempting to use 
a workstation in a library or archive that has loaded a self-voicing 
browser as a means of accommodating a class of their patrons...

what is it, after all that we are doing?  we are promulgating Web Content 
ACCESSIBILITY guidelines, and if we are to set a maximal and minimal range 
for rate of synthesized speech, it is incumbent upon us to:

a) ascertain the average highest speech rate allowable for software 
synthesizers in context, which is to say, how they are deployed in the real 
world, and offer the average as our minimal requirement (almost all of the 
software speech synthesizers are licensable)

b) ascertain the average lowest speech rate allowable for software 
synthesizers that are available to classes of users who need supplemental 
speech in order to process information (i.e. low vision users using speech 
in conjunction with screen magnification and persons with certain types of 
dyslexia)...

if we, as you suggest, simply throw out the numbers i collated, we will be 
doing a major disservice to those who interact with the web in an 
exclusively aural environment, and failure to take this into account would 
not only be a grave error, but the foundation for a minority opinion and a 
lot of criticism when we recycle through Last Call...

i am not trying to cow you into using my means of ascertaining a minimal 
high rate, only pointing out that, for anyone who interacts with the web in 
an exclusively aural environment, the ability to control speech rate is the 
equivalent of being able to change the font size, font family, and the size 
of the window...

it is a serious issue, and deserves the serious attention of the working 
group...  note, as well, that i CCed the post to 3 people who have an 
enormous amount of experience with users who synthesized speech, both as 
their primary (or even, as in my case and the case of anyone with 
neuropathy, as well as the more than 30 percent of blind people who do not 
or cannot read braille) means of obtaining information, or as an 
indispensable supplement to what they are able to perceive visually...

i am holding firm and fast to 700 wpm as the minimal requirement for this 
checkpoint, as i believe you would, ian, were the tables turned...

as for the samples provided, those are the only 4 software speech engines 
to which i have recourse -- they just happen to be the 4 most widely 
deployed, although there is a sizable minority of DECTalk Access32, a copy 
of which i do not own) and the lowest maximum rates -- that of the MS 
Speech Engine...  my proposal called for other WG members who have access 
to other software synthesizer provide information about their speech rates 
to the list as well, but a speech-enabled browser needs to have robust a 
speech engine as possible, otherwise, it is just a toy for the sighted and 
those whose eyeballs are otherwise preoccupied

1. single language support (the version of the MS Speech Engine i have is 
US English only) -- i'd be interested in hearing if dick or tim or someone 
else at MS could let us know if the rates available to the user change when 
the language to be synthesized is changed?  it doesn't in Eloquence, 
ViaVoice, or Orpheus

2. intention -- the speech engines developed outside of the AT world tend 
to be utilized as "powertoys" -- using the MS Speech Engine, for example, 
without a screen reader, one can get Eudora 4.3x to speak a number of the 
header fields if the user so chooses

what we are discussing here isn't simply "wow! wouldn't it be cool if i 
could have this page read while i make a sandwich, but providing access to 
a range of users who need to be able to adjust the speech rate over 500 wpm 
in order to be productive...

gregory.


---------------------------------------------------------------
BIGOT, n.  One who is obstinately and zealously attached to an
opinion that you do not entertain.           -- Ambrose Bierce
---------------------------------------------------------------
Gregory J. Rosmaita  <unagi69@concentric.net>
Camera Obscura       <http://www.hicom.net/~oedipus/index.html>
VICUG NYC            <http://www.hicom.net/~oedipus/vicug/>
Read 'Em & Speak     <http://www.hicom.net/~oedipus/books/>
---------------------------------------------------------------
Received on Friday, 16 June 2000 16:14:30 UTC