W3C home > Mailing lists > Public > www-smil@w3.org > October to December 2000

Re: Synthesized-speech auditory descriptions

From: Brad Botkin <brad_botkin@wgbh.org>
Date: Fri, 27 Oct 2000 15:30:22 -0400
Message-ID: <39F9D7CE.3E7488CC@wgbh.org>
To: "Cohen, Aaron M" <aaron.m.cohen@intel.com>
CC: geoff freed <geoff_freed@wgbh.org>, "Hansen, Eric" <ehansen@ets.org>, www-smil@w3.org, thierry michel <tmichel@w3.org>, www-smil-request@w3.org
Aaron,
I think the actual transcription of the audio deserves its own tag,
since it's so specific. For the same reason that you created a
systemAudioDesc tag and didn't just use the alt tag.  You need a place
to look that's consistent.  I believe the longdesc is intented to be
used as simply a longer text description of the unnderlying graphic or
media file. And in the case of audio description snippets, the longdesc
could be used to hold timing or other metadata related to the snippet
but not specifically voiced. I think that verbatim text will prove
invaluable in the future, for searching, etc., and you should consider
creating a specific tag for this.
--Brad
\_\_\_\_\_\_\_\_\_\_
Brad_Botkin@wgbh.org   Director, Technology & Systems Development
617.300.3902 (v/f)               NCAM/WGBH - National Center for 
125 Western Ave Boston MA 02134              Accessible Media
\_\_\_\_\_\_\_\_\_\_


"Cohen, Aaron M" wrote:
> 
> Brad:
> 
> We also have alt and longdesc, either of which could be used by a player to
> provide accessory or alternative text content. These can be combined with
> the systemLanguage and other test attributes to provide many combinations of
> accessiblity and internationalization.
> -Aaron
> 
> > -----Original Message-----
> > From: Brad Botkin [mailto:brad_botkin@wgbh.org]
> > Sent: Friday, October 27, 2000 5:41 AM
> > To: geoff freed
> > Cc: Hansen, Eric; www-smil@w3.org; thierry michel;
> > www-smil-request@w3.org
> > Subject: Re: Synthesized-speech auditory descriptions
> >
> >
> > Geoff,
> > True but incomplete.  It sounds like Eric is asking for a tag
> > which identifies text as a transcription of the underlying
> > audio.   Something like:
> >
> > <par>
> > .....
> >     <audio    systemAudioDesc="on"
> >                     AudioDescText="The lady in the pink
> > sweater picks up the pearl necklace from the table and walks to the
> > door."
> >                     src="snippet8043.wav"/>
> > .....
> > </par>
> >
> > It's a great idea, since the text is super-thin, making it
> > appropriate for transmission in narrow pipes with local
> > text-to-speech synthesis for playback.  Note that the volume
> > of snippets in a longer piece, like a movie, is huge, just
> > like closed captions.  Inclusion of 1000 audio description
> > snippets and 2000 closed captions, each in 3 languages, each
> > with its own timecode, all in the same SMIL file will make
> > for some *very* unfriendly  files.  Better would be to provide a
> > mechanism which allows the SMIL file to gracefully point to
> > separate files each containing the timecoded AD snippets (with
> > transcriptions per the above) and timecoded captions.  It
> > requires the SMIL player to gracefully overlay the external
> > timeline onto the intrinsic timeline of the SMIL file.
> > Without this, SMIL won't be used for interchange of caption and
> > description data for anything longer than a minute or two.  A
> > translation house shouldn't have to unwind a bazillion audio
> > descriptions and captions in umpteen other languages to
> > insert its French translation.
> >
> > Regards,
> > --Brad
> > \_\_\_\_\_\_\_\_\_\_\_
> > Brad_Botkin@wgbh.org   Director, Technology & Systems Development
> > (v/f) 617.300.3902               NCAM/WGBH - National Center for
> > 125 Western Ave Boston MA 02134              Accessible Media
> > \_\_\_\_\_\_\_\_\_\_\_
> >
> >
> > geoff freed wrote:
> >
> > > Hi, Eric:
> > >
> > > SMIL 2.0 provides support for audio descriptions via a test
> > attribute, systemAudioDesc.  The author can record audio
> > >  descriptions digitally and synchronize them into a SMIL
> > presentation using this attribute, similar to how captions are
> > >  synchronized into SMIl presentations using systemCaptions
> > (or system-captions, as it is called in SMIL 1.0).
> > >
> > > Additionally, using SMIL2.0's <excl> and <priorityClass>
> > elements, the the author may pause a video track
> > >  automatically, play an extended audio description and,
> > when the description is finished, resume playing the video
> > >  track.  This will be a boon for situations  where the
> > natural pauses in the program audio aren't sufficient for audio
> > >  descriptions.
> > >
> > > Geoff Freed
> > > CPB/WGBH National Center for Accessible Media (NCAM)
> > > WGBH Educational Foundation
> > > geoff_freed@wgbh.org
> > >
> > > On Wednesday, October 25, 2000, thierry michel
> > <tmichel@w3.org> wrote:
> > > >
> > > >> My questions concern the use of SMIL for developing
> > auditory descriptions
> > > >> for multimedia presentations.
> > > >>
> > > >> The Web Content Accessibility Guidelines (WCAG) version
> > 1.0 of W3C/WAI
> > > >> indicates the possibility of using speech synthesis for
> > providing auditory
> > > >> descriptions for multimedia presentations. Specifically,
> > checkpoint 1.3 of
> > > >> WCAG 1.0 reads:
> > > >>
> > > >> "1.3 Until user agents can automatically read aloud the
> > text equivalent of
> > > >a
> > > >> visual track, provide an auditory description of the
> > important information
> > > >> of the visual track of a multimedia presentation. [Priority 1]
> > > >> Synchronize the auditory description with the audio track as per
> > > >checkpoint
> > > >> 1.4. Refer to checkpoint 1.1 for information about
> > textual equivalents for
> > > >> visual information." (WCAG 1.0, checkpoint 1.3).
> > > >>
> > > >> In the same document in the definition of "Equivalent", we read:
> > > >>
> > > >> "One example of a non-text equivalent is an auditory
> > description of the
> > > >key
> > > >> visual elements of a presentation. The description is
> > either a prerecorded
> > > >> human voice or a synthesized voice (recorded or
> > generated on the fly). The
> > > >> auditory description is synchronized with the audio track of the
> > > >> presentation, usually during natural pauses in the audio
> > track. Auditory
> > > >> descriptions include information about actions, body
> > language, graphics,
> > > >and
> > > >> scene changes."
> > > >>
> > > >> My questions are as follows:
> > > >>
> > > >> 1. Does SMIL 2.0 support the development of synthesized
> > speech auditory
> > > >> descriptions?
> > > >>
> > > >> 2. If the answer to question #1 is "Yes", then briefly
> > describe the
> > > >support
> > > >> that is provided.
> > > >>
> > > >> 3. If the answer to question #1 is "No", then please
> > describe any plans
> > > >for
> > > >> providing such support in the future.
> > > >>
> > > >> Thanks very much for your consideration.
> > > >>
> > > >> - Eric G. Hansen
> > > >> Development Scientist
> > > >> Educational Testing Service (ETS)
> > > >> Princeton, NJ 08541
> > > >> ehansen@ets.org
> > > >> Co-Editor, W3C/WAI User Agent Accessibility Guidelines
> > > >>
> > > >
> >
> >
Received on Friday, 27 October 2000 15:31:52 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:34:23 UTC