RE: Synthesized-speech auditory descriptions from Cohen, Aaron M on 2000-10-27 (www-smil@w3.org from October to December 2000)

From: Cohen, Aaron M <aaron.m.cohen@intel.com>
Date: Fri, 27 Oct 2000 12:10:20 -0700
To: "'Brad Botkin'" <brad_botkin@wgbh.org>, geoff freed <geoff_freed@wgbh.org>
Cc: "Hansen, Eric" <ehansen@ets.org>, www-smil@w3.org, thierry michel <tmichel@w3.org>, www-smil-request@w3.org
Message-ID: <D5E932F578EBD111AC3F00A0C96B1E6F0626B2C1@orsmsx31.jf.intel.com>
Brad:

We also have alt and longdesc, either of which could be used by a player to
provide accessory or alternative text content. These can be combined with
the systemLanguage and other test attributes to provide many combinations of
accessiblity and internationalization.
-Aaron

> -----Original Message-----
> From: Brad Botkin [mailto:brad_botkin@wgbh.org]
> Sent: Friday, October 27, 2000 5:41 AM
> To: geoff freed
> Cc: Hansen, Eric; www-smil@w3.org; thierry michel;
> www-smil-request@w3.org
> Subject: Re: Synthesized-speech auditory descriptions
> 
> 
> Geoff,
> True but incomplete.  It sounds like Eric is asking for a tag 
> which identifies text as a transcription of the underlying
> audio.   Something like:
> 
> <par>
> .....
>     <audio    systemAudioDesc="on"
>                     AudioDescText="The lady in the pink 
> sweater picks up the pearl necklace from the table and walks to the
> door."
>                     src="snippet8043.wav"/>
> .....
> </par>
> 
> It's a great idea, since the text is super-thin, making it 
> appropriate for transmission in narrow pipes with local
> text-to-speech synthesis for playback.  Note that the volume 
> of snippets in a longer piece, like a movie, is huge, just
> like closed captions.  Inclusion of 1000 audio description 
> snippets and 2000 closed captions, each in 3 languages, each
> with its own timecode, all in the same SMIL file will make 
> for some *very* unfriendly  files.  Better would be to provide a
> mechanism which allows the SMIL file to gracefully point to 
> separate files each containing the timecoded AD snippets (with
> transcriptions per the above) and timecoded captions.  It 
> requires the SMIL player to gracefully overlay the external
> timeline onto the intrinsic timeline of the SMIL file.  
> Without this, SMIL won't be used for interchange of caption and
> description data for anything longer than a minute or two.  A 
> translation house shouldn't have to unwind a bazillion audio
> descriptions and captions in umpteen other languages to 
> insert its French translation.
> 
> Regards,
> --Brad
> \_\_\_\_\_\_\_\_\_\_\_
> Brad_Botkin@wgbh.org   Director, Technology & Systems Development
> (v/f) 617.300.3902               NCAM/WGBH - National Center for
> 125 Western Ave Boston MA 02134              Accessible Media
> \_\_\_\_\_\_\_\_\_\_\_
> 
> 
> geoff freed wrote:
> 
> > Hi, Eric:
> >
> > SMIL 2.0 provides support for audio descriptions via a test 
> attribute, systemAudioDesc.  The author can record audio
> >  descriptions digitally and synchronize them into a SMIL 
> presentation using this attribute, similar to how captions are
> >  synchronized into SMIl presentations using systemCaptions 
> (or system-captions, as it is called in SMIL 1.0).
> >
> > Additionally, using SMIL2.0's <excl> and <priorityClass> 
> elements, the the author may pause a video track
> >  automatically, play an extended audio description and, 
> when the description is finished, resume playing the video
> >  track.  This will be a boon for situations  where the 
> natural pauses in the program audio aren't sufficient for audio
> >  descriptions.
> >
> > Geoff Freed
> > CPB/WGBH National Center for Accessible Media (NCAM)
> > WGBH Educational Foundation
> > geoff_freed@wgbh.org
> >
> > On Wednesday, October 25, 2000, thierry michel 
> <tmichel@w3.org> wrote:
> > >
> > >> My questions concern the use of SMIL for developing 
> auditory descriptions
> > >> for multimedia presentations.
> > >>
> > >> The Web Content Accessibility Guidelines (WCAG) version 
> 1.0 of W3C/WAI
> > >> indicates the possibility of using speech synthesis for 
> providing auditory
> > >> descriptions for multimedia presentations. Specifically, 
> checkpoint 1.3 of
> > >> WCAG 1.0 reads:
> > >>
> > >> "1.3 Until user agents can automatically read aloud the 
> text equivalent of
> > >a
> > >> visual track, provide an auditory description of the 
> important information
> > >> of the visual track of a multimedia presentation. [Priority 1]
> > >> Synchronize the auditory description with the audio track as per
> > >checkpoint
> > >> 1.4. Refer to checkpoint 1.1 for information about 
> textual equivalents for
> > >> visual information." (WCAG 1.0, checkpoint 1.3).
> > >>
> > >> In the same document in the definition of "Equivalent", we read:
> > >>
> > >> "One example of a non-text equivalent is an auditory 
> description of the
> > >key
> > >> visual elements of a presentation. The description is 
> either a prerecorded
> > >> human voice or a synthesized voice (recorded or 
> generated on the fly). The
> > >> auditory description is synchronized with the audio track of the
> > >> presentation, usually during natural pauses in the audio 
> track. Auditory
> > >> descriptions include information about actions, body 
> language, graphics,
> > >and
> > >> scene changes."
> > >>
> > >> My questions are as follows:
> > >>
> > >> 1. Does SMIL 2.0 support the development of synthesized 
> speech auditory
> > >> descriptions?
> > >>
> > >> 2. If the answer to question #1 is "Yes", then briefly 
> describe the
> > >support
> > >> that is provided.
> > >>
> > >> 3. If the answer to question #1 is "No", then please 
> describe any plans
> > >for
> > >> providing such support in the future.
> > >>
> > >> Thanks very much for your consideration.
> > >>
> > >> - Eric G. Hansen
> > >> Development Scientist
> > >> Educational Testing Service (ETS)
> > >> Princeton, NJ 08541
> > >> ehansen@ets.org
> > >> Co-Editor, W3C/WAI User Agent Accessibility Guidelines
> > >>
> > >
> 
>
Received on Friday, 27 October 2000 15:13:05 UTC