- From: Marja-Riitta Koivunen <marja@w3.org>
- Date: Mon, 29 Nov 1999 12:15:06 -0500
- To: Wendy A Chisholm <wendy@w3.org>, w3c-wai-gl@w3.org
- Cc: w3c-wai-ua@w3.org
At 11:22 AM 11/29/99 -0500, Wendy A Chisholm wrote: > >> >>I'm a little confused. Did you really mean that the text equivalent of a >>visual track can be an auditory description. I thought it would be text if >>it is text equivalent? If it is audio why is it then important to do >>automatic text-to-speech processing (read the text aload)? > >the text equivalent of the visual track needs to be text. in future user >agents we were hoping that this track would be synthesized to >speech. current user agents do not do this, thus both a text equivalent of >the visual track as well as a prerecorded video description need to be >provided. > >there are a few things needed for "movies": >1. a visual representation of auditory information (captions) >2. an auditory representation of visual information (descriptive video) >3. collated text transcript of the audio and visual information (the text >of descriptive video and the original auditory track/captions) > >TODAY an author has to: >1. provide captions (either by using a text document synchronized with >video via something like SMIL or SAMI, or create a second visual track with >the captions). >2. provide a video description (a secondary audio track) >3. provide a collated text transcript. OK. So does the text equivalent in WCAG 1.3 refer to the collated text? Reading that aload makes sense because then you don't need to worry about timing. The user can just listen to it separately from the video. And I guess this is something that the screenreaders can already do automatically. But I guess we wanted every browser to do that in 1.3 UNTIL UA clause? It did not occur to me earlier from WCAG that the author needs to always provide the collated text. Should that be said more explicitly if that is demanded? >in the FUTURE we hope the author will: >1. Provide text with timecodes that is classified as either caption or >video description. this information can then be synchronized and >synthesized to speech, synchronized as captions, or collated into a >collated text transcript. Captions do have timecodes. Video (or audio) description has too. But having textual transcript of audio description with proper timecodes means the audio description already exists. So I don't know why to generate it from text, to save space? Going from text to audio description means we need to generate the timecodes too which is not necessarily a simple task especially if we try to do it automatically. Pausing might work, but I'm not sure for instance with SMIL, if you can pause everything else but not the automatic audio of the collated text. At least we should know what element contains the collated text and that info is not available in SMIL. >Keep in mind that for people who are deaf and blind, the combination of >both the captions and the text of the descriptive video (a "collated text >transcript") is their only means of accessing the information. OK, so collated transcripts are needed for video presentations as P1? In this case there is no need for synchronization with video and audio(unless if you want to synchronize the info with a group of people where some can see or hear). Marja >--wendy ><> >wendy a chisholm (wac) >world wide web consortium (w3c) >web accessibility initiative (wai) >madison, wisconsin (madcity, wi) >united states of america (usa) >tel: +1 608 663 6346 ></> >
Received on Monday, 29 November 1999 12:16:21 UTC