RE: Synthesized-speech auditory descriptions from thierry michel on 2000-10-30 (www-smil@w3.org from October to December 2000)

From: thierry michel <tmichel@w3.org>
Date: Mon, 30 Oct 2000 14:32:32 +0100
To: "Hansen, Eric" <ehansen@ets.org>, "'geoff freed'" <geoff_freed@wgbh.org>, <www-smil@w3.org>, <www-smil-request@w3.org>
Message-ID: <00f501c04275$ded2d5e0$228a608a@inria.fr>
----- Original Message -----
From: "Hansen, Eric" <ehansen@ets.org>
To: "'geoff freed'" <geoff_freed@wgbh.org>; "Hansen, Eric"
<ehansen@ets.org>; <www-smil@w3.org>; "thierry michel" <tmichel@w3.org>;
<www-smil-request@w3.org>
Sent: Friday, October 27, 2000 2:26 PM
Subject: [Moderator Action] RE: Synthesized-speech auditory descriptions


> Geoff,
>
> Thanks very much for your response. The information about prerecorded
> auditory descriptions is helpful but does not specifically address the
> questions that I have posed. My questions pertained to synthesized speech
> auditory descriptions.
>
> The same capabilities to which you referred regarding pausing and resuming
> would be important. But SMIL would need to expose the auditory description
> text (i.e., a text equivalent of visual track) plus information that would
> allowing synchronizing that text with the regular audio and visual tracks.
A
> system could then insert the synthesized speech into the natural pauses.
> When the speech duration would exceed the duration of the natural pause,
> then a sysetm might perform some combination of video pause and speech
> speed-up.
>
> I think that the basic question is this:
>
> Is SMIL capable of synchronizing the auditory description text with the
> regular auditory and visual tracks and then exposing that information in a
> way that could be recognized by a speech synthesizer?
>
> Thanks very much!
>
> - Eric Hansen
>
>
> -----Original Message-----
> From: geoff freed [mailto:geoff_freed@wgbh.org]
> Sent: Thursday, October 26, 2000 5:41 PM
> To: Hansen, Eric; www-smil@w3.org; thierry michel;
> www-smil-request@w3.org
> Subject: Re: Synthesized-speech auditory descriptions
>
>
> Hi, Eric:
>
> SMIL 2.0 provides support for audio descriptions via a test attribute,
> systemAudioDesc.  The author can record audio
>  descriptions digitally and synchronize them into a SMIL presentation
using
> this attribute, similar to how captions are
>  synchronized into SMIl presentations using systemCaptions (or
> system-captions, as it is called in SMIL 1.0).
>
> Additionally, using SMIL2.0's <excl> and <priorityClass> elements, the the
> author may pause a video track
>  automatically, play an extended audio description and, when the
description
> is finished, resume playing the video
>  track.  This will be a boon for situations  where the natural pauses in
the
> program audio aren't sufficient for audio
>  descriptions.
>
> Geoff Freed
> CPB/WGBH National Center for Accessible Media (NCAM)
> WGBH Educational Foundation
> geoff_freed@wgbh.org
>
>
> On Wednesday, October 25, 2000, thierry michel <tmichel@w3.org> wrote:
> >
> >> My questions concern the use of SMIL for developing auditory
descriptions
> >> for multimedia presentations.
> >>
> >> The Web Content Accessibility Guidelines (WCAG) version 1.0 of W3C/WAI
> >> indicates the possibility of using speech synthesis for providing
> auditory
> >> descriptions for multimedia presentations. Specifically, checkpoint 1.3
> of
> >> WCAG 1.0 reads:
> >>
> >> "1.3 Until user agents can automatically read aloud the text equivalent
> of
> >a
> >> visual track, provide an auditory description of the important
> information
> >> of the visual track of a multimedia presentation. [Priority 1]
> >> Synchronize the auditory description with the audio track as per
> >checkpoint
> >> 1.4. Refer to checkpoint 1.1 for information about textual equivalents
> for
> >> visual information." (WCAG 1.0, checkpoint 1.3).
> >>
> >> In the same document in the definition of "Equivalent", we read:
> >>
> >> "One example of a non-text equivalent is an auditory description of the
> >key
> >> visual elements of a presentation. The description is either a
> prerecorded
> >> human voice or a synthesized voice (recorded or generated on the fly).
> The
> >> auditory description is synchronized with the audio track of the
> >> presentation, usually during natural pauses in the audio track.
Auditory
> >> descriptions include information about actions, body language,
graphics,
> >and
> >> scene changes."
> >>
> >> My questions are as follows:
> >>
> >> 1. Does SMIL 2.0 support the development of synthesized speech auditory
> >> descriptions?
> >>
> >> 2. If the answer to question #1 is "Yes", then briefly describe the
> >support
> >> that is provided.
> >>
> >> 3. If the answer to question #1 is "No", then please describe any plans
> >for
> >> providing such support in the future.
> >>
> >> Thanks very much for your consideration.
> >>
> >> - Eric G. Hansen
> >> Development Scientist
> >> Educational Testing Service (ETS)
> >> Princeton, NJ 08541
> >> ehansen@ets.org
> >> Co-Editor, W3C/WAI User Agent Accessibility Guidelines
> >>
> >
>
Received on Monday, 30 October 2000 08:33:03 UTC