W3C home > Mailing lists > Public > w3c-wai-ua@w3.org > April to June 2000

Re: voice modes used for orientation (was Re: Raw minutes of 15 June UA Guidelines)

From: Al Gilman <asgilman@iamdigex.net>
Date: Sun, 18 Jun 2000 09:09:46 -0500
Message-Id: <200006181255.IAA1544695@smtp1.mail.iamworld.net>
To: "Gregory J. Rosmaita" <unagi69@concentric.net>, Harvey Bingham <hbingham@ACM.org>
Cc: User Agent Guidelines Emailing List <w3c-wai-ua@w3.org>, Anne T Gilman <atgilman@io.com>, "Nick RAGOUZIS (Interfacility)" <nick@interfacility.com>
Thank you both, Gregory for telling and Harvey for asking.

If you will forgive my theorizing about this practice a little, I am going
to brain dump at y'all and it probably doesn't affect the UA document of
the moment but it somehow belongs in the knowledgebase of the WAI as a
whole.  This is some of the background or rationale that we need to be
centralizing for use across topics.

For my purposes, I would like to say that these different-voice categories
mix two kinds of distinctions which I will group as context and mood.

The screen-reader browse process has two fingers in the book: the
application cursor and the reading cursor.  The user can read from either
one without moving the other.  So the user needs to know from which finger
the voice is reading.  That is a context indication.  Other aspects of
context have to do with where you are in the overall session stack or
desktop: In the outermost (task bar) chrome; in the chrome of which
window, identified by its application and document; in which frame inside
which browser+document window.

The classic mood indication is the distinction between messages and alerts.
 Both of these are information about events that have happened.  The
messages are just describing noteworthy but normal happenings.  The alerts
are messages about situations which are flat out errors or are considered
risky or abnormal enough so that the system wants to make sure that the
user is aware of them.  These things sometimes you have to acknowledge them
to get them to go away and get the computer to get on with what you want to
do.  Most annoying to me at times.

Messages and alerts are a protocol reflecting severity-graded information
transactions relating to the ongoing process.  Confirm cycles are a
protocol reflecting severity-graded classes of control transactions.

I am sure this is all spelled out as dialogistics in the HCI literature
somewhere -- aside from that literature not covering the screen reader case.

Does the Microsoft Logo Program literature get this analytic about a
reference model for UI design?  Is anyone else aware of HCI literature that
covers the concepts I am groping toward?

PS:  Does EmacSpeak or any screen reader do anything as far as prosodic
inflection for quotes and parenthetical remarks buried in a text?  Or is
this just covered in your tuning of "punctuation verbosity"?


At 12:17 AM 2000-06-18 -0400, Gregory J. Rosmaita wrote:
>aloha, harvey!
>in response to my minuted comments:
>          GR: Frequently, there are about 6 different voices used
>                 for orientation.
>you asked,
>Gregory, I'd like that list of uses included, in the note, as recognizably
>useful distinctions that voice characteristics can provide.
>as a general rule, screen-readers allow users to set distinct vocal 
>characteristics -- which are roughly akin to the "Appearance" property 
>sheet of the "Display Properties" available to users via the Control Panel 

>in the Windows environment -- as an orientational mechanism, that is 
>capable of instantly communicating to the user the context in which he or 
>she is working and/or the source of the synthesized speech being spouted at 
>him or her...:
>one of the main uses of these differentiation mechanisms is to distinguish 
>whether the application cursor or the speech cursor is active...  the 
>speech cursor provides a gross navigational mechanism that not only allows 
>the user to grope about available screen space in order to reconnoiter the 
>application window, but which usually also serves to move the pointing 
>device's point-of-regard, which is often necessary to activate or 
>deactivate an object or discrete area of the screen in the absence of a 
>keyboard equivalent, or when the sub-window fails to receive focus, isn't 
>keyboard focusable, or is a custom control which neither the application 
>nor the screen reader recognize as a control, but simply as a graphic...
>each "voice mode" contains a range of vocal characteristics (including, but 
>not limited to, volume,  rate, person, pitch and punctuation verbosity, 
>which can be usually further sub-divided into "All", "Most", "Some", or 
>"None"), in order to provide as broad a range of individual tailoring as 
>possible -- some users, for example, prefer to only switch genders as an 
>orientation mechanism, some switch only the "cutely named" synthesized 
>voice, some solely the pitch or rate, but most use a combination of the 
>configuration options available to them, so as to provide as instantaneous 
>an orientational mechanism as possible...
>the 6 most voice modes are:
>         Global
>         Keyboard (i.e. keyboard input echo vocal characteristics)
>         Application Cursor
>         Speech Cursor
>         Messages
>         Prompts
>note that some screen-readers treat "Messages" (such as announcing "Page 5 
>of 15" when one moves across a page boundary in a word processor) and 
>"Prompt" (labels attached to controls) as a single entity, while others 
>offer a wider range of flexibility...
>during the teleconference, i mentioned another vocal characteristic, 
>Uppercase Indication, which, while (usually) not a discrete voice mode, is 
>a voice characteristic which is often grouped with the voice modes listed 
>above....  some synthesizers offer only incremental control over pitch, 
>others issue earcons (usually in the form of a tone for a capital letter or 
>a double tone to indicate all caps), or say "cap" or "all caps", or some 

>combination of the 3...
>note: the information contained in this emessage is generalized from my 
>personal and professional experience with screen readers, primarily in the 
>Windows and DOS environments, although i did double-check my facts using 
>the 4 Windows and 5 DOS-based screen readers which i have loaded on my 
>laptop...  i have also been fortunate enough to use both emacspeak in 
>real-life situations and to play around a bit with outSpoken on a 
>mac...   while the outSpoken approach is similar to that employed by 
>Windows-based screen readers, both emacspeak and aster employ spatial 
>effects as orientational vocal characteristics, whereas most other speech 
>synthesizers which do support spatial effects do so mostly for novelty's 
>sake (one offering a female voice "in a hall", "in space" and "in an 
>ACCOUNTABILITY, n.  The mother of caution.
>                         -- Ambrose Bierce, _The Devil's Dictionary_
>Gregory J. Rosmaita      <unagi69@concentric.net>
>Camera Obscura           <http://www.hicom.net/~oedipus/index.html>
>VICUG NYC                <http://www.hicom.net/~oedipus/vicug/>
>Read 'Em & Speak         <http://www.hicom.net/~oedipus/books/>
Received on Sunday, 18 June 2000 08:52:34 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:38:27 UTC