Re: Questions about WCAG 1.3 from Marja-Riitta Koivunen on 1999-11-29 (w3c-wai-gl@w3.org from October to December 1999)

From: Marja-Riitta Koivunen <marja@w3.org>
Date: Mon, 29 Nov 1999 12:15:06 -0500
To: Wendy A Chisholm <wendy@w3.org>, w3c-wai-gl@w3.org
Cc: w3c-wai-ua@w3.org
Message-Id: <3.0.5.32.19991129121506.00b526c0@localhost>

At 11:22 AM 11/29/99 -0500, Wendy A Chisholm wrote:
>
>>
>>I'm a little confused. Did you really mean that the text equivalent of a
>>visual track can be an auditory description. I thought it would be text if
>>it is text equivalent? If it is audio why is it then important to do
>>automatic text-to-speech processing (read the text aload)?
>
>the text equivalent of the visual track needs to be text. in future user 
>agents we were hoping that this track would be synthesized to 
>speech.  current user agents do not do this, thus both a text equivalent of 
>the visual track as well as a prerecorded video description need to be 
>provided.
>
>there are a few things needed for "movies":
>1.  a visual representation of auditory information (captions)
>2.  an auditory representation of visual information (descriptive video)
>3.  collated text transcript of the audio and visual information (the text 
>of descriptive video and the original auditory track/captions)
>
>TODAY an author has to:
>1. provide captions (either by using a text document synchronized with 
>video via something like SMIL or SAMI, or create a second visual track with 
>the captions).
>2.  provide a video description (a secondary audio track)
>3.  provide a collated text transcript.

OK. So does the text equivalent in WCAG 1.3 refer to the collated text?
Reading that aload makes sense because then you don't need to worry about
timing. The user can just listen to it separately from the video. And I
guess this is something that the screenreaders can already do
automatically. But I guess we wanted every browser to do that in 1.3 UNTIL
UA clause?

It did not occur to me earlier from WCAG that the author needs to always
provide the collated text. Should that be said more explicitly if that is
demanded?

>in the FUTURE we hope the author will:
>1.  Provide text with timecodes that is classified as either caption or 
>video description.  this information can then be synchronized and 
>synthesized to speech, synchronized as captions, or collated into a 
>collated text transcript.

Captions do have timecodes. Video (or audio) description has too. But
having textual transcript of audio description with proper timecodes means
the audio description already exists. So I don't know why to generate it
from text, to save space?

Going from text to audio description means we need to generate the
timecodes too which is not necessarily a simple task especially if we try
to do it automatically. Pausing might work, but I'm not sure for instance
with SMIL, if you can pause everything else but not the automatic audio of
the collated text. At least we should know what element contains the
collated text and that info is not available in SMIL.

>Keep in mind that for people who are deaf and blind, the combination of 
>both the captions and the text of the descriptive video (a "collated text 
>transcript") is their only means of accessing the information.

OK, so collated transcripts are needed for video presentations as P1? In
this case there is no need for synchronization with video and audio(unless
if you want to synchronize the info with a group of people where some can
see or hear).

Marja

>--wendy
><>
>wendy a chisholm (wac)
>world wide web consortium (w3c)
>web accessibility initiative (wai)
>madison, wisconsin (madcity, wi)
>united states of america (usa)
>tel: +1 608 663 6346
></>
>

Received on Monday, 29 November 1999 12:16:21 UTC