- From: Wendy A Chisholm <wendy@w3.org>
- Date: Mon, 29 Nov 1999 10:03:51 -0500
- To: Marja-Riitta Koivunen <marja@w3.org>, w3c-wai-gl@w3.org
- Cc: w3c-wai-ua@w3.org
Hello Marja, >Checkpoint: >1.3 Until user agents can automatically read aloud the text equivalent of a >visual track, provide an auditory description of the important information >of the visual track of a multimedia presentation. [Priority 1] Synchronize >the auditory description with the audio track as per checkpoint 1.4. Refer >to checkpoint 1.1 for information about textual equivalents for visual >information. Techniques for checkpoint 1.3 > >Questions: >I was trying to think how this checkpoint could be implemented >in the user agent. >First question is what the author actually provides when >he provides the text equivalent of a visual track? It seems that it is >something that can be used to create auditory description. So it needs to >be a continuous text stream that is synchronized to the video as it is >describing the contents of the video. the text equivalent of a visual track is what WGBH/NCAM call "Descriptive Video." This is an auditory track that describes visual information during breaks in dialog and other auditory events. Examples are available from the NCAM web site [1]. If it is recorded by a human, then it can be a secondary audio track or it can be broken up into multiple files and played on cue (as with SMIL. see examples from NCAM's beta MagPie tool). If it is synthesized by a machine then the text can be synthesized to speech on cue. >Why would someone create such a textstream? A collated text transcript >that can be read independently from the video would make more sense to me. I think you are assuming that the textstream appears as text and not spoken. If spoken, the auditory description can help a person who can't see the video stream by giving them the visual cues they are missing. >If a user can see text why not look the video rather than the description? the text should be spoken because they can't see the video. >Are there users that have hard time interpreting the video and that's why? this will be another beneficial use, but primarily it is to be used by people who can not see the video. >When the device does not have a screen where to show the video it seems >that collated text makes more sense. yes. there are cases where the collated text transcript does make sense. the future ideal is for there to be one text document and the appropriate pieces are synchronized or synthesized on cue. >A textstream need to be synchronized so that there is enough time to read >it. The synchronization of the text that is visually read might be >different for text than for the automatically created audio. So to >automatically create an audio description based on text stream that is >synchronized in a right way with audio might be difficult. Especially as >the synchronization might change the timing of the original video as well >if the natural pauses are not long enough to include the audio >descriptions. Is there any ideas how the synchronization is created >automatically from the textstream? I'm not sure what amount of work has been done about this, but I know it has been talked about. One of the exciting things about digital presentations is that the visual presentation could be paused during a long auditory description. Currently, auditory descriptions are recorded by humans to fit during the appropriate pauses in dialog (and other auditory events). This often means that not all of the info is given or that it is given much earlier than the actual event. >Shortly: What does the author actually provide as text equivalent and how >should the UA or the media player create the audio description from that? the checkpoint says, "until user agents..." Therefore TODAY, the author has to provide the prerecorded auditory description in an additional audio track. In the future, the author should either provide a secondary audio track or a text transcript with time codes. A user agent could take this information and send it to a speech synthesizer on cue. This would be similar to how SMIL presentations can currently show captions on cue. does this help? --wendy [1] http://www.wgbh.org/wgbh/pages/ncam/webaccess/captionedmovies.html <> wendy a chisholm (wac) world wide web consortium (w3c) web accessibility initiative (wai) madison, wisconsin (madcity, wi) united states of america (usa) tel: +1 608 663 6346 </>
Received on Monday, 29 November 1999 09:57:17 UTC