Re: Audio Collisions

to follow up on what David Pawson said:
> > > 
> 	[David Pawson]  Are we approaching a 'channelling' effect?
> 	The impact of personal choice would leave a user instructing the
> 	browser to selectively action visual and auditory output,

[Al Gilman]

Yes, you have got the basic idea very well.  

User control over how streams of information from the source get
directed to sensory channels of the user.

Text-to-speech gives us some crossover capability.  The web page
author thinks of text as destined for the user's eyes.  But the
eyes-free user redirects the text to his/her ears.  If there is
already an audio track targeted to the audio sensation, there is
contention.

In the case of a movie description done at NCAM, you can mix it
with the sound track because it is synchronously designed and
edited to be overlayed in that way.  On the web, the sound
effects are designed asynchronously, the collisions are less
benign, and the user will have to exercise more choices
concerning whether to mix the sound streams or break them apart,
muting one or another of them at times.

[Snip]
	[David Pawson]
> 	Channels would need to be defined for 
> 	Primary output visual
> 	Primary output audio
> 	   [One of these may be defined as my preferred prime channel]
> 	Secondary output visual 
> 	Secondary output audio

[Al Gilman]

Because we have some ability to shift content between
user-sensation channels in the user equipment, the content
providers don't have to provide separate data for all the
profiles of user capability and preference that will be served.
There is not an end-to-end set of parallel channels.  The
information flow has some redirection and mixing capability on
the user side.

A combination of some redundancy (such as by transcripts and
captions, which shadow sound in text) in the data bundle offered
by the source, together with user control over how the
source-provided streams or components are presented, gives us the
maximum adaptability for the minimum cost.
  
	[David Pawson]
> 	if we wanted to get exotic, the presence of a secondary channel
> 	output could lead to an event to which I may wish to respond, by
> halting the 
> 	main channel output to listen, look at the secondary channel?
> 

[Al Gilman]
Yes, I was imagining something that exotic.

Consider a slide-show presentation with a continuous audio track
and a sequence of still images.  On can imagine the blind user
playing the audio track in near real time.  They could skim along
with just the titles of the slides automatically spliced into the
audio at the points where the slide changes.  Then, when the
voice track doesn't convey a complete story, the user could stop
the playback, reset the play mode, and have it read the text on
the slides and possibly an audio or textual description of the
slide before proceeding with each frame's-worth of the sound
track.

To be realistic, I think we have to talk separately about the
audio, visual, and tactile channels by which the information
finally gets to the user a little separately from the media types
that carry the information from the Web server to the Web client.
HTML text with CSS styling is a media type that lives in the
HTTP dialog that has dual capability to be presented in sight
or sound.  With an ACSS style in the library, the sound can be
even better.  But other content, like GIF files, is not that
flexible.  For these we have to build in separate data [the
description] to make the message accessible in sound.

Sometimes alternate presentation of portions of the information
will be prepared at the source, and sometimes they will be
generated at the user.  The author will not in general have
thought through all the combinations and conflicts that can
arise, so the system has to reserve some control to the user.

-- Al Gilman

Received on Wednesday, 27 August 1997 09:41:58 UTC