Re: Audio Access

to follow up on what Geoff Freed said:
> 
> Al's correct.  I should have qualified the "automatically."
> However, is it necessary for there to be a spoken description of
> a sound effect caption?  I'll illustrate with yet another
> parallel to broadcast captioning and audio description: if a
> program contains both closed captions and audio descriptions, (as
> some do on PBS and home video), the description track does not
> reflect the fact that captions are being displayed on the
> television screen.  In other words, the captions are not read as
> part of the descriptions.  Conversely, the audio descriptions are
> not reflected in the closed captions.
> 
> Now, apply this to the Web.  If I were a deaf Web user, sound
> effects would be described to me visually, using a caption (or
> something like it).  And if I were a blind user, I wouldn't need
> a sound effect described aurally to me because I could already
> hear it.  Thus, you'd only need a sound effect *caption*, not a
> description.  That would eliminate the problem Al describes
> below.  Yes?
> 
> [referring to...]
> > For those using synthetic speech to access text, there are
> > potential problems when the sound effect, and/or the spoken text
> > of a description of the sound effect, collides (in the audio
> > delivered to the user) with the presentation of spoken text
> > extracted from the page.
> 

Can't say as how I anticipated a _description of the caption_.
What I was talking about was that when there is a description of 
the sound effect, as there is sometimes a description of an image, that
some users would want to use text-to-speech to read the description
of the effect.

Just as visually impaired (but not blind) users may wish to
access both an image and its description I would suspect that
there will be users with auditory impairment but who are not deaf
would be in a grey zone where they would want to access sounds
and/or descriptions of sounds with sound-by-sound navigation
choice as opposed to a longstanding preference choice.  Those
that are at the same time blind would, I suppose, access the text
description by text-to-speech techniques.

A perhaps more important consideration is the fact that on the
Web, as opposed to transcribing broadcast content, there are
sounds attached to text and graphics which are programmed to play
asynchronously on mouse events, such as when the mouse cursor
enters the graphic region used to present certain text.  This is
where I see the major source of destructive interference between
programmed sounds and sounds created by the text-to-speech
transcription process.  First, the mouse point and the reading
point are loosely connected and things happening automatically on
mouse motion without a button press may be confusing, and
secondly even if the sound effect does apply to the text
currently being read it may obscure the audibilty of the
synthesized speech.  The overlaying of these two sounds is not
what the author designed in, and the user will need to be able to
fix it when it interferes.

I think that that is where Geoff was agreeing with me that
on-event sounds should be convertible to on-selection sounds if
the user needs this additional control to keep the sound effects
from trampling on the reading process.

More generally, the notion of mouse events is too physical, too
tied to the unique characteristics of the GUI, to be universal
HTML which ports gracefully into non-visual browse modes.  In a
non-visual browse, text has no layout coordinates bound to it.
It is just text that falls within some part of the document
pursuant to the document structure encoded in the markup.  The
cursor location, as a point in a graphic canvas, doesn't exist.
So events detected by monitoring the cursor location don't exist,
and the control conditions for starting the sound are not
defined.

--
Al Gilman

Received on Tuesday, 26 August 1997 09:34:02 UTC