- From: Cohen, Aaron M <aaron.m.cohen@intel.com>
- Date: Fri, 27 Oct 2000 12:10:20 -0700
- To: "'Brad Botkin'" <brad_botkin@wgbh.org>, geoff freed <geoff_freed@wgbh.org>
- Cc: "Hansen, Eric" <ehansen@ets.org>, www-smil@w3.org, thierry michel <tmichel@w3.org>, www-smil-request@w3.org
Brad: We also have alt and longdesc, either of which could be used by a player to provide accessory or alternative text content. These can be combined with the systemLanguage and other test attributes to provide many combinations of accessiblity and internationalization. -Aaron > -----Original Message----- > From: Brad Botkin [mailto:brad_botkin@wgbh.org] > Sent: Friday, October 27, 2000 5:41 AM > To: geoff freed > Cc: Hansen, Eric; www-smil@w3.org; thierry michel; > www-smil-request@w3.org > Subject: Re: Synthesized-speech auditory descriptions > > > Geoff, > True but incomplete. It sounds like Eric is asking for a tag > which identifies text as a transcription of the underlying > audio. Something like: > > <par> > ..... > <audio systemAudioDesc="on" > AudioDescText="The lady in the pink > sweater picks up the pearl necklace from the table and walks to the > door." > src="snippet8043.wav"/> > ..... > </par> > > It's a great idea, since the text is super-thin, making it > appropriate for transmission in narrow pipes with local > text-to-speech synthesis for playback. Note that the volume > of snippets in a longer piece, like a movie, is huge, just > like closed captions. Inclusion of 1000 audio description > snippets and 2000 closed captions, each in 3 languages, each > with its own timecode, all in the same SMIL file will make > for some *very* unfriendly files. Better would be to provide a > mechanism which allows the SMIL file to gracefully point to > separate files each containing the timecoded AD snippets (with > transcriptions per the above) and timecoded captions. It > requires the SMIL player to gracefully overlay the external > timeline onto the intrinsic timeline of the SMIL file. > Without this, SMIL won't be used for interchange of caption and > description data for anything longer than a minute or two. A > translation house shouldn't have to unwind a bazillion audio > descriptions and captions in umpteen other languages to > insert its French translation. > > Regards, > --Brad > \_\_\_\_\_\_\_\_\_\_\_ > Brad_Botkin@wgbh.org Director, Technology & Systems Development > (v/f) 617.300.3902 NCAM/WGBH - National Center for > 125 Western Ave Boston MA 02134 Accessible Media > \_\_\_\_\_\_\_\_\_\_\_ > > > geoff freed wrote: > > > Hi, Eric: > > > > SMIL 2.0 provides support for audio descriptions via a test > attribute, systemAudioDesc. The author can record audio > > descriptions digitally and synchronize them into a SMIL > presentation using this attribute, similar to how captions are > > synchronized into SMIl presentations using systemCaptions > (or system-captions, as it is called in SMIL 1.0). > > > > Additionally, using SMIL2.0's <excl> and <priorityClass> > elements, the the author may pause a video track > > automatically, play an extended audio description and, > when the description is finished, resume playing the video > > track. This will be a boon for situations where the > natural pauses in the program audio aren't sufficient for audio > > descriptions. > > > > Geoff Freed > > CPB/WGBH National Center for Accessible Media (NCAM) > > WGBH Educational Foundation > > geoff_freed@wgbh.org > > > > On Wednesday, October 25, 2000, thierry michel > <tmichel@w3.org> wrote: > > > > > >> My questions concern the use of SMIL for developing > auditory descriptions > > >> for multimedia presentations. > > >> > > >> The Web Content Accessibility Guidelines (WCAG) version > 1.0 of W3C/WAI > > >> indicates the possibility of using speech synthesis for > providing auditory > > >> descriptions for multimedia presentations. Specifically, > checkpoint 1.3 of > > >> WCAG 1.0 reads: > > >> > > >> "1.3 Until user agents can automatically read aloud the > text equivalent of > > >a > > >> visual track, provide an auditory description of the > important information > > >> of the visual track of a multimedia presentation. [Priority 1] > > >> Synchronize the auditory description with the audio track as per > > >checkpoint > > >> 1.4. Refer to checkpoint 1.1 for information about > textual equivalents for > > >> visual information." (WCAG 1.0, checkpoint 1.3). > > >> > > >> In the same document in the definition of "Equivalent", we read: > > >> > > >> "One example of a non-text equivalent is an auditory > description of the > > >key > > >> visual elements of a presentation. The description is > either a prerecorded > > >> human voice or a synthesized voice (recorded or > generated on the fly). The > > >> auditory description is synchronized with the audio track of the > > >> presentation, usually during natural pauses in the audio > track. Auditory > > >> descriptions include information about actions, body > language, graphics, > > >and > > >> scene changes." > > >> > > >> My questions are as follows: > > >> > > >> 1. Does SMIL 2.0 support the development of synthesized > speech auditory > > >> descriptions? > > >> > > >> 2. If the answer to question #1 is "Yes", then briefly > describe the > > >support > > >> that is provided. > > >> > > >> 3. If the answer to question #1 is "No", then please > describe any plans > > >for > > >> providing such support in the future. > > >> > > >> Thanks very much for your consideration. > > >> > > >> - Eric G. Hansen > > >> Development Scientist > > >> Educational Testing Service (ETS) > > >> Princeton, NJ 08541 > > >> ehansen@ets.org > > >> Co-Editor, W3C/WAI User Agent Accessibility Guidelines > > >> > > > > >
Received on Friday, 27 October 2000 15:13:05 UTC