W3C home > Mailing lists > Public > w3c-wai-ua@w3.org > April to June 2000

Thoughts on multimedia and some definitions

From: Ian Jacobs <ij@w3.org>
Date: Mon, 26 Jun 2000 00:40:01 -0400
Message-ID: <3956DEA1.52F78D6D@w3.org>
To: w3c-wai-ua@w3.org
Hello,

This message is an attempt to capture some of the
discussion between Charles McCathieNevile, Eric Hansen,
and myself about some of the concepts related to
multimedia that are part of the UA Guidelines. The
purpose of this message is primarily to document some
issues raised during that discussion. I don't speak
here on behalf of Eric or Charles.  The message is not
entirely coherent, but I wanted to get some notes out
to the Working Group. There is a semblance of a
proposal (of definitions) at the end of the email.

The goal of this email is to contribute to the effort
to answer some questions about definitions of terms
related to multimedia. Eric has already sent a number
of emails on this topic ([2], [3], [4]).  Three
definitions in the 10 June Guidelines [1] relate to
multimedia: auditory presentation, multimedia
presentation, and synchronize. We also use the
following terms but do not defined them: auditory
track, visual track.

[1] http://www.w3.org/WAI/UA/WD-UAAG10-20000610

[2] History and Meaning of the term "Multimedia"
   http://lists.w3.org/Archives/Public/w3c-wai-ua/2000AprJun/0503.html

[3] Definitions of Visual Track and Auditory Track, Etc.
   http://lists.w3.org/Archives/Public/w3c-wai-ua/2000AprJun/0374.html

[4] Comments on multimedia and audio
   http://lists.w3.org/Archives/Public/w3c-wai-gl/1999OctDec/0290.html


In our telephone discussion, we considered how a number
of "axes" might impact definitions of terms related to
multimedia.  Here are the axes:

 - Content type v. rendering modality (audio, video, tactile)
 - Stand-alone v. complementary
 - Primary v. alternative content
 - Static v. dynamic
 - Synchronized v. unsynchronized
 - Distinguishable tracks 

Below is a little bit of exposition on the axes.

1) Source or Rendering?
   When we say something like "allow the user to freeze
   animations", we are probably referring to content that
   is rendered as an animation, whatever the format of the
   content. So, an animation may be the result of an animated
   gif or SVG animation, the effect of a script, or the 
   application of a style sheet to text. If we consider 
   rendering rather than source format, the key terms we 
   should be using relate to the senses: auditory, visual, 
   and tactile. Our definitions should be oriented towards 
   how the content is received.

2) Dynamic content.
   Content may evolve in different ways over time:
   a) A static HTML page does not evolve.
   b) A dynamic HTML page may change or evolve under
      the effect of scripts.
   c) Audio and video have natural time components.

   Questions:
 
     - To what extent is a multimedia presentation required
       to change over time? For instance, is a static HTML
       page with background audio playing a "degenerate"
       multimedia presentation?

     - Does a multimedia presentation necessarily require
       the synchronization of components? What if I have
       a page of images, I select a link to play an
       audio clip, and I select another link to view a
       video clip. Is this a multimedia presentation?

2) Stand-alone versus complementary. When an author produces
   content, some components may serve complementary purposes
   while others may serve equivalent purposes. For instance,
   in a television program, while the visual information and
   auditory information are certainly related, they are not
   equivalents for one another. Recall that an auditory
   equivalent for the visual track of a presentation is an 
   audio track plus a synchronized auditory description of the
   visual information. 

   Other components of content may be (functional) equivalents
   of on another (e.g., text captions are the text equivalent
   of the audio track). 

   It might be possible to define a multimedia presentation as:
       a) A presentation that includes both visual tracks and
          audio tracks.
       b) These tracks complement each other.

   A stand-alone presentation is one that does not require 
   a complement to convey its message. For instance, a radio
   program is a stand-alone auditory presentation. 

   Based on these definitions, a radio program would not be
   consider a multimedia presentation, even if the radio
   program were accompanied by equivalents. 

   Similarly, a radio program with an accompanying video
   track of signing hands would not be a multimedia presentation
   since the visual track is a functional equivalent of the
   audio. Alternatives form a unit in a different way than
   multimedia components form a unit. I think it's possible
   to talk about "primary content" and its alternatives as
   a unit. "Primary" probably means what the author intends
   to be rendered most of the time.

3) Presentation versus Track

   a) Based on the previous discussion of "complementary"
      components, the term presentation would refer to
      a "complete" presentation (all necessary components
      included, be they stand-alone or multimedia, with
      alternative equivalents considered separately).

   b) The term "track" would refer to either a video or
      and audio track of a multimedia presentation. However,
      if a static HTML page plus background audio is considered
      a multimedia presentation, then calling the static page
      a "track" seems odd. Calling the background audio a 
      track seems less odd to me.

   c) With some formats, user agents can distinguish tracks,
      with others, they may not be able to (e.g., a SMIL
      presentation with discernible tracks versus a single
      mixed audio source).


Proposal:

1) Start with basic components in terms of rendering, not
   source format:

  <DEF>
   Visually rendered content: any content rendered for the 
     visual sense. This would have to include images, text, 
     video, scripts that produce visual effects, style sheets
     that produce visual effects, etc.
  </DEF>

  <DEF>
   auditorily rendered content: any content rendered for the 
     visual sense. This includes text rendered as
     speech, pre-recorded audio, etc.
  </DEF>

2) Introduce stand-alone v. track:

  <DEF>
   Stand-alone audio presentation: Auditorily rendered 
   dynamic content that conveys a message without
   requiring additional content. Note that stand-alone
   audio presentations require alternatives
   so that they will be accessible to some users.
  </DEF>

  <DEF>
   Stand-alone video presentation: Visually rendered
   dynamic content that conveys a message without
   requiring additional content. Note that stand-alone
   video presentations require alternatives
   so that they will be accessible to some users.
  </DEF>

  <DEF>
   Auditory track: Auditorily rendered dynamic content
   that is functionally part of a larger presentation.
   Note that audio tracks require alternatives
   so that they will be accessible to some users.
  </DEF>

  <DEF>
   Visual track: visually rendered dynamic content
   that is functionally part of a larger presentation.
   Note that visual tracks require alternatives
   so that they will be accessible to some users.
  </DEF>

  <DEF>
   Synchronized multimedia presentation: A presentation
   consisting of at least one auditory track that is 
   synchronized with a visual track. Note that tracks
   of a multimedia presentation require alternatives so 
   that they will be accessible to some users.
  </DEF>


Notes and questions;

  - Where does animation fit?

  - The term "dynamic content" needs to be clarified.

  - Part of the discussion involved trying to fit static
    content plus background audio into a larger definition.
    Trying to do so may be a mistake. At the 22 June 
    teleconference [5], Gregory took an action item to 
    investigate requirements for configuring the user
    agent to not render audio on load, so I anticipate
    the background audio question to be resolved in
    light of Gregory's proposals.

  - When should we use "audio" and when should we use  
    "auditory"? Same for "video" and "visual". Also, we
    have consciously used the term "graphical" instead
    of "visual" for a long time.

[5] http://lists.w3.org/Archives/Public/w3c-wai-ua/2000AprJun/0505.html


Your comments welcome,

 - Ian

-- 
Ian Jacobs (jacobs@w3.org)   http://www.w3.org/People/Jacobs
Tel:                         +1 831 457-2842
Cell:                        +1 917 450-8783
Received on Monday, 26 June 2000 00:40:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 October 2009 06:50:04 GMT