W3C home > Mailing lists > Public > w3c-wai-ua@w3.org > October to December 1999

Re: Questions about WCAG 1.3

From: Wendy A Chisholm <wendy@w3.org>
Date: Tue, 30 Nov 1999 08:55:50 -0500
Message-Id: <>
To: Marja-Riitta Koivunen <marja@w3.org>, w3c-wai-gl@w3.org
Cc: w3c-wai-ua@w3.org

>I mean the text equivalent in:
>1.3 Until most user agents can automatically read aloud the text equivalent
>of the visual track ...
>Because in UA we need to know how we could do it automatically. If text 
>equivalent is just unsynchronized text it is easy to read when the UA has 
>text-to-speech capabilities. If it need to do the synchronization too it 
>becomes difficult. Therefore I wanted to know what is the text equivalent 
>that is being read aloud. Does it already have timecodes? Are those 
>timecodes ment for showing the text on the screen or are they calculated 
>for playing the text as audio embedded in between the other audio tracks? 
>In this case the audio description is already created once so there is not 
>much saving of the author's time. In this case, why don't we use the audio 
>description needed for generating timecodes instead of trying to create it 
>again automatically from text?

the text equivalent that is being read aloud should have timecodes.  The 
timecodes are calculated for playing the text as audio embedded in between 
the other audio tracks.

you say it does not save the author time - is that because they have to 
generate the time codes?  With something like MagPie, it would be easy to 
create timecodes - even if the author just creates a timecode where he/she 
wants it in MagPie then copies and pastes that to anther file where needed.

If the author has to record the text themselves (in a separate audio track) 
we at least save them that effort.  They will have to create the timecodes 
whether they synchronize a text track for speech synthesis or synchronize a 
prerecorded audio track.

Part of the assumption (for the future scenario) is that the author can 
pause the primary audio and video tracks while the video description 
(whether prerecorded or synthesized speech) is read.

Using synthesized speech is kind of like "separating content from 
presentation."  Kind of like suggesting that authors use text and style 
sheets instead of text in images since a user can change the speed, voice, 
pitch, etc. of the speech to suit their needs.

Hopefully, this would also facilitate translation of the descriptions into 
other languages as the automatic translators evolve (like the altavista 
translator http://babelfish.altavista.com/).  If the text is synchronized 
and goes through a translator before being spoken - we've now got 
descriptions (and why not do the captions also!?) in any language that 
babelfish-like tools are capable of processing.

wendy a chisholm (wac)
world wide web consortium (w3c)
web accessibility initiative (wai)
madison, wisconsin (madcity, wi)
united states of america (usa)
tel: +1 608 663 6346
Received on Tuesday, 30 November 1999 08:49:12 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:38:24 UTC