Re: Thoughts on multimedia and some definitions

At 12:40 AM 2000-06-26 -0400, Ian Jacobs wrote:
>Your comments welcome,

>Notes and questions;
>
>  - Where does animation fit?
>

AG:

Subdivide.  Recognize different classes of animations based on how
dependent they are to a) self-synchronization, i.e. presentation on a
continuous and steady timebase, and b) external synchronization, how
dependent is the composite on synchronization relationships touching this
component?

Animation is to video as drawn graphics are to image.  They interpolate
between very simple structures right up to fooling you into believing that
it is real.  The playback options for animation are not graded by the fact
that it is animation but based on the classification of the message encoded
in the animation and the interdependencies between this content and the
content of peer presentation components.

Some animations require continuous time rendering; these are like animated
feature films, they are synthetic video as far as their "perception to
cognition processing" is concerned.

Others are mere stylistic flourishes.  For example transition effects in
Power Point.  If the bullet twirls as a new list item emerges onto the
screen, this is expendable and need not be sped up in proportion as the
text reading is sped up.

This last can be handled by subcases of synchronization constraints.

>  - The term "dynamic content" needs to be clarified.
>

AG:

Once again, it needs to be subdivided.

Many people in the authoring community say "dynamic content" when they are
referring to Active Server Pages or Cold Fusion generated pages.

One needs to distinguish time-dependent display from interactivity where
the display changes in response to what the user does, or what the system
thinks the user did as when the mouse moves.  Then one needs to look at
subdividing or grading how tightly the time-evolution of the display has to
be controlled for the message to be comprehensible.  Audio is not
comprehensible in freeze-frame; video has some comprehensibility when
perturbed that way.  Animations are split as to whether you can single-step
them and understand the result or not.

>  - Part of the discussion involved trying to fit static
>    content plus background audio into a larger definition.
>    Trying to do so may be a mistake. At the 22 June 
>    teleconference [5], Gregory took an action item to 
>    investigate requirements for configuring the user
>    agent to not render audio on load, so I anticipate
>    the background audio question to be resolved in
>    light of Gregory's proposals.
>

AG:

Yes, you want to relate this category to some inclusive super-category.
But that super-category may not be ready for prime time, i.e. it may be
something that we as technologists understand but the person in the street
only sees the subcases as wholly different things.

Even here I would distinguish a range of cases.  The appropriate user
control options are different if the sound is truly wallpaper, as compared
with a film-strip-equivalent presentation where the audio is the primary
content and the visual display is secondary.


This is a content judgement which you cannot tell from the type of data in
which the content has been encoded, by the way.  [compare with animation;
do you see a trend?]

>  - When should we use "audio" and when should we use  
>    "auditory"? Same for "video" and "visual". Also, we
>    have consciously used the term "graphical" instead
>    of "visual" for a long time.
>

If you want to align your usage with common parlance, use all these terms
as follows.  Use 'audio' when your center of focus in on the computer side
of the physical human/computer interface.  This focuses on the data
resources and how they get rendered as audible signals.  Correspondingly,
use 'auditory' when your center of focus is on the message, the perceptual
and cognitive effect of the happenings at the physical human/computer
interface.  Audio files let you present auditory content.  The auditory
content is what you hear.  The audio is what the sound card plays into the
speakers.  I think that is more or less how the public perception splits.
[How to do a web search to check this???]  If you say 'audio' you will have
lots of readers who read "file format."  If you say 'auditory' most people
will read "perception."

Summary:

The user needs to be able to control things about how multimedia resources
are delivered.  What kinds of user control are appropriate depends on the
content structure of the composite resource.  In particular, it is
sensitive to a) how the messages delivered by the several chunks targeted
to different channels and media relate to one another and b) how much you
can change the presentation of a given data resource before it breaks up,
i.e. loses comprehensibility.


This is all expressible as a web of assertions about "the content/message
of A, the content of B, the content of C, etc."  At times you will have to
specialize this to "the conceptual content of A, the perceptual content of
B, the 'screen cross time' content of C," etc.

Throughout all this the following image is running through my mind:

Compare the respective content of media objects A and B.

Is there any overlap?  Is there anything that one can learn from A that one
could also learn from B?  Repeat the question in reverse.

Does either of them dominate the other?  Is most of what you can learn from
A also learnable from B?  Or vice Versa?

If there is overlap, but neither dominates the other, how about the
respective amounts of information in the content?  Is there just more that
you can learn from A or from B?

[Note that in parallel with these 'benefit' based comparisons, one is doing
parallel comparisons on 'cost' that is how long it takes to get to the
point where media object A or B is understood.  And one is also looking at
system dependencies.  A given resource may only be presentable in visible
form.  Or it may only be comprehensible if synchronized with a given other
resource.  And so forth.]

In many cases there is the simple relationship that A is a short version of
B.  A and B are about the same general topic, but B tells you in more
detail.  This applies to header and section, to Legend and figure, to audio
description and video.  Sometimes this relationship is between objects in
different media, sometimes the same.

Although the 'content' of the message cannot be computed from the data by
an algorithm, these relationships and characteristics of the content of a
specific media object can be correctly assessed by a journeyman editor.  It
doesn't take a whiz.

Al

-- all quote below

>Hello,
>
>This message is an attempt to capture some of the
>discussion between Charles McCathieNevile, Eric Hansen,
>and myself about some of the concepts related to
>multimedia that are part of the UA Guidelines. The
>purpose of this message is primarily to document some
>issues raised during that discussion. I don't speak
>here on behalf of Eric or Charles.  The message is not
>entirely coherent, but I wanted to get some notes out
>to the Working Group. There is a semblance of a
>proposal (of definitions) at the end of the email.
>
>The goal of this email is to contribute to the effort
>to answer some questions about definitions of terms
>related to multimedia. Eric has already sent a number
>of emails on this topic ([2], [3], [4]).  Three
>definitions in the 10 June Guidelines [1] relate to
>multimedia: auditory presentation, multimedia
>presentation, and synchronize. We also use the
>following terms but do not defined them: auditory
>track, visual track.
>
>[1] http://www.w3.org/WAI/UA/WD-UAAG10-20000610
>
>[2] History and Meaning of the term "Multimedia"
>   http://lists.w3.org/Archives/Public/w3c-wai-ua/2000AprJun/0503.html

>
>[3] Definitions of Visual Track and Auditory Track, Etc.
>   http://lists.w3.org/Archives/Public/w3c-wai-ua/2000AprJun/0374.html
>
>[4] Comments on multimedia and audio
>   http://lists.w3.org/Archives/Public/w3c-wai-gl/1999OctDec/0290.html
>
>
>In our telephone discussion, we considered how a number
>of "axes" might impact definitions of terms related to
>multimedia.  Here are the axes:
>
> - Content type v. rendering modality (audio, video, tactile)
> - Stand-alone v. complementary
> - Primary v. alternative content
> - Static v. dynamic
> - Synchronized v. unsynchronized
> - Distinguishable tracks 
>
>Below is a little bit of exposition on the axes.
>
>1) Source or Rendering?
>   When we say something like "allow the user to freeze
>   animations", we are probably referring to content that
>   is rendered as an animation, whatever the format of the
>   content. So, an animation may be the result of an animated
>   gif or SVG animation, the effect of a script, or the 
>   application of a style sheet to text. If we consider 
>   rendering rather than source format, the key terms we 
>   should be using relate to the senses: auditory, visual, 
>   and tactile. Our definitions should be oriented towards 
>   how the content is received.
>
>2) Dynamic content.
>   Content may evolve in different ways over time:
>   a) A static HTML page does not evolve.
>   b) A dynamic HTML page may change or evolve under
>      the effect of scripts.
>   c) Audio and video have natural time components.
>
>   Questions:
> 
>     - To what extent is a multimedia presentation required
>       to change over time? For instance, is a static HTML
>       page with background audio playing a "degenerate"
>       multimedia presentation?
>
>     - Does a multimedia presentation necessarily require
>       the synchronization of components? What if I have
>       a page of images, I select a link to play an
>       audio clip, and I select another link to view a
>       video clip. Is this a multimedia presentation?
>
>2) Stand-alone versus complementary. When an author produces
>   content, some components may serve complementary purposes
>   while others may serve equivalent purposes. For instance,
>   in a television program, while the visual information and
>   auditory information are certainly related, they are not
>   equivalents for one another. Recall that an auditory
>   equivalent for the visual track of a presentation is an 
>   audio track plus a synchronized auditory description of the
>   visual information. 
>
>   Other components of content may be (functional) equivalents
>   of on another (e.g., text captions are the text equivalent
>   of the audio track). 
>
>   It might be possible to define a multimedia presentation as:
>       a) A presentation that includes both visual tracks and
>          audio tracks.
>       b) These tracks complement each other.
>
>   A stand-alone presentation is one that does not require 
>   a complement to convey its message. For instance, a radio
>   program is a stand-alone auditory presentation. 
>

>   Based on these definitions, a radio program would not be
>   consider a multimedia presentation, even if the radio
>   program were accompanied by equivalents. 
>
>   Similarly, a radio program with an accompanying video
>   track of signing hands would not be a multimedia presentation
>   since the visual track is a functional equivalent of the
>   audio. Alternatives form a unit in a different way than
>   multimedia components form a unit. I think it's possible
>   to talk about "primary content" and its alternatives as
>   a unit. "Primary" probably means what the author intends
>   to be rendered most of the time.
>
>3) Presentation versus Track
>
>   a) Based on the previous discussion of "complementary"
>      components, the term presentation would refer to
>      a "complete" presentation (all necessary components
>      included, be they stand-alone or multimedia, with
>      alternative equivalents considered separately).
>
>   b) The term "track" would refer to either a video or
>      and audio track of a multimedia presentation. However,
>      if a static HTML page plus background audio is considered
>      a multimedia presentation, then calling the static page
>      a "track" seems odd. Calling the background audio a 
>      track seems less odd to me.
>
>   c) With some formats, user agents can distinguish tracks,
>      with others, they may not be able to (e.g., a SMIL
>      presentation with discernible tracks versus a single
>      mixed audio source).
>
>
>Proposal:
>
>1) Start with basic components in terms of rendering, not
>   source format:
>
>  <DEF>
>   Visually rendered content: any content rendered for the 
>     visual sense. This would have to include images, text, 
>     video, scripts that produce visual effects, style sheets
>     that produce visual effects, etc.
>  </DEF>
>
>  <DEF>
>   auditorily rendered content: any content rendered for the 
>     visual sense. This includes text rendered as
>     speech, pre-recorded audio, etc.
>  </DEF>
>
>2) Introduce stand-alone v. track:
>
>  <DEF>
>   Stand-alone audio presentation: Auditorily rendered 
>   dynamic content that conveys a message without
>   requiring additional content. Note that stand-alone
>   audio presentations require alternatives
>   so that they will be accessible to some users.
>  </DEF>
>
>  <DEF>
>   Stand-alone video presentation: Visually rendered
>   dynamic content that conveys a message without
>   requiring additional content. Note that stand-alone
>   video presentations require alternatives
>   so that they will be accessible to some users.
>  </DEF>
>
>  <DEF>
>   Auditory track: Auditorily rendered dynamic content
>   that is functionally part of a larger presentation.
>   Note that audio tracks require alternatives
>   so that they will be accessible to some users.
>  </DEF>
>
>  <DEF>
>   Visual track: visually rendered dynamic content
>   that is functionally part of a larger presentation.
>   Note that visual tracks require alternatives
>   so that they will be accessible to some users.
>  </DEF>
>

>  <DEF>
>   Synchronized multimedia presentation: A presentation
>   consisting of at least one auditory track that is 
>   synchronized with a visual track. Note that tracks
>   of a multimedia presentation require alternatives so 
>   that they will be accessible to some users.
>  </DEF>
>
>
>Notes and questions;
>
>  - Where does animation fit?
>
>  - The term "dynamic content" needs to be clarified.
>
>  - Part of the discussion involved trying to fit static
>    content plus background audio into a larger definition.
>    Trying to do so may be a mistake. At the 22 June 
>    teleconference [5], Gregory took an action item to 
>    investigate requirements for configuring the user
>    agent to not render audio on load, so I anticipate
>    the background audio question to be resolved in
>    light of Gregory's proposals.
>
>  - When should we use "audio" and when should we use  
>    "auditory"? Same for "video" and "visual". Also, we
>    have consciously used the term "graphical" instead
>    of "visual" for a long time.
>
>[5] http://lists.w3.org/Archives/Public/w3c-wai-ua/2000AprJun/0505.html
>
>
>Your comments welcome,
>
> - Ian
>
>-- 
>Ian Jacobs (jacobs@w3.org)   http://www.w3.org/People/Jacobs
>Tel:                         +1 831 457-2842
>Cell:                        +1 917 450-8783
> 

Received on Monday, 26 June 2000 21:46:35 UTC