Re: Questions about WCAG 1.3

At 08:55 AM 11/30/99 -0500, Wendy A Chisholm wrote:
>Marja,
>
>
>>I mean the text equivalent in:
>>
>>1.3 Until most user agents can automatically read aloud the text equivalent
>>of the visual track ...
>>
>>Because in UA we need to know how we could do it automatically. If text 
>>equivalent is just unsynchronized text it is easy to read when the UA has 
>>text-to-speech capabilities. If it need to do the synchronization too it 
>>becomes difficult. Therefore I wanted to know what is the text equivalent 
>>that is being read aloud. Does it already have timecodes? Are those 
>>timecodes ment for showing the text on the screen or are they calculated 
>>for playing the text as audio embedded in between the other audio tracks? 
>>In this case the audio description is already created once so there is not 
>>much saving of the author's time. In this case, why don't we use the audio 
>>description needed for generating timecodes instead of trying to create it 
>>again automatically from text?
>
>the text equivalent that is being read aloud should have timecodes.  The 
>timecodes are calculated for playing the text as audio embedded in between 
>the other audio tracks.

OK. So, now couple of more questions.

Is the WCAG asking authors to create that kind of textstream file for video
so that the UA can create the audio description automatically from it? 

How would the UA know where that textstream file is located e.g. how to
find it among other textstream files, such as captions?

>you say it does not save the author time - is that because they have to 
>generate the time codes?  With something like MagPie, it would be easy to 
>create timecodes - even if the author just creates a timecode where he/she 
>wants it in MagPie then copies and pastes that to anther file where needed.

OK. It may save some time. I was thinking first that there is a text file
that is attached to video always anyway so that the only thing needed is to
read it automatically by the UA. It seems like the textstream file you are
talking about is done just for the purpose to be read as audio description
of the video. Or can it be used for other purposes?
So in this way it does not save time, it is done for the audio descriptions.

>If the author has to record the text themselves (in a separate audio track) 
>we at least save them that effort.  They will have to create the timecodes 
>whether they synchronize a text track for speech synthesis or synchronize a 
>prerecorded audio track.
>
>Part of the assumption (for the future scenario) is that the author can 
>pause the primary audio and video tracks while the video description 
>(whether prerecorded or synthesized speech) is read.
>
Which might be clumsy e.g. it you need to pause in a middle of a sentence
or a scene. So the author still need to look carefully how to time
everything together.

>Using synthesized speech is kind of like "separating content from 
>presentation."  Kind of like suggesting that authors use text and style 
>sheets instead of text in images since a user can change the speed, voice, 
>pitch, etc. of the speech to suit their needs.
>
Does this mean that we actually rather want the textstream for audio than a
recorded audio description? That I did not actually understand from 1.3.
Maybe my foreign language problems?

Marja

>Hopefully, this would also facilitate translation of the descriptions into 
>other languages as the automatic translators evolve (like the altavista 
>translator http://babelfish.altavista.com/).  If the text is synchronized 
>and goes through a translator before being spoken - we've now got 
>descriptions (and why not do the captions also!?) in any language that 
>babelfish-like tools are capable of processing.
>
>--wendy
><>
>wendy a chisholm (wac)
>world wide web consortium (w3c)
>web accessibility initiative (wai)
>madison, wisconsin (madcity, wi)
>united states of america (usa)
>tel: +1 608 663 6346
></>

Received on Tuesday, 30 November 1999 13:50:45 UTC