Re: Synthesized-speech auditory descriptions from geoff freed on 2000-10-26 (www-smil@w3.org from October to December 2000)

From: geoff freed <geoff_freed@wgbh.org>
Date: 26 Oct 2000 17:40:55 -0400
To: "Hansen, Eric" <ehansen@ets.org>, <www-smil@w3.org>, thierry michel <tmichel@w3.org>, <www-smil-request@w3.org>
Message-ID: <-1239540443geoff_freed@wgbh.org>

Hi, Eric:

SMIL 2.0 provides support for audio descriptions via a test attribute, systemAudioDesc.  The author can record audio
 descriptions digitally and synchronize them into a SMIL presentation using this attribute, similar to how captions are
 synchronized into SMIl presentations using systemCaptions (or system-captions, as it is called in SMIL 1.0).

Additionally, using SMIL2.0's <excl> and <priorityClass> elements, the the author may pause a video track
 automatically, play an extended audio description and, when the description is finished, resume playing the video
 track.  This will be a boon for situations  where the natural pauses in the program audio aren't sufficient for audio
 descriptions.

Geoff Freed
CPB/WGBH National Center for Accessible Media (NCAM)
WGBH Educational Foundation
geoff_freed@wgbh.org


On Wednesday, October 25, 2000, thierry michel <tmichel@w3.org> wrote:
>
>> My questions concern the use of SMIL for developing auditory descriptions
>> for multimedia presentations.
>>
>> The Web Content Accessibility Guidelines (WCAG) version 1.0 of W3C/WAI
>> indicates the possibility of using speech synthesis for providing auditory
>> descriptions for multimedia presentations. Specifically, checkpoint 1.3 of
>> WCAG 1.0 reads:
>>
>> "1.3 Until user agents can automatically read aloud the text equivalent of
>a
>> visual track, provide an auditory description of the important information
>> of the visual track of a multimedia presentation. [Priority 1]
>> Synchronize the auditory description with the audio track as per
>checkpoint
>> 1.4. Refer to checkpoint 1.1 for information about textual equivalents for
>> visual information." (WCAG 1.0, checkpoint 1.3).
>>
>> In the same document in the definition of "Equivalent", we read:
>>
>> "One example of a non-text equivalent is an auditory description of the
>key
>> visual elements of a presentation. The description is either a prerecorded
>> human voice or a synthesized voice (recorded or generated on the fly). The
>> auditory description is synchronized with the audio track of the
>> presentation, usually during natural pauses in the audio track. Auditory
>> descriptions include information about actions, body language, graphics,
>and
>> scene changes."
>>
>> My questions are as follows:
>>
>> 1. Does SMIL 2.0 support the development of synthesized speech auditory
>> descriptions?
>>
>> 2. If the answer to question #1 is "Yes", then briefly describe the
>support
>> that is provided.
>>
>> 3. If the answer to question #1 is "No", then please describe any plans
>for
>> providing such support in the future.
>>
>> Thanks very much for your consideration.
>>
>> - Eric G. Hansen
>> Development Scientist
>> Educational Testing Service (ETS)
>> Princeton, NJ 08541
>> ehansen@ets.org
>> Co-Editor, W3C/WAI User Agent Accessibility Guidelines
>>
>

Received on Thursday, 26 October 2000 17:40:46 UTC