[whatwg] Video, Closed Captions, and Audio Description Tracks

Dave Singer wrote:

>>> an alternate audio track (e.g. speex as suggested by you for 
>>> accessibility to blind people),
>> My understanding is that at least conceptually an audio description 
>> track is *supplementary* to the normal sound track. Could someone who 
>> knows more about the production of audio descriptions, please, comment 
>> if audio description can in practice be implemented as a supplementary 
>> sound track that plays concurrently with the main sound track (in that 
>> case Speex would be appropriate) or whether the main sound must be 
>> manually mixed differently when description is present?
> Sometimes;  but sometimes, for example:
> * background music needs to be reduced
> * other audio material needs to be 'moved' to make room for audio 
> description

The relationship between audio description and the main sound appears to 
be a non-simple one. See:


>> I think we should first focus on two kinds on qualitatively different 
>> timed text (differing in metadata and playback defaults):
>>  1) Captions for the deaf:
>>   * Written in the same language as the speech content of the video is 
>> spoken.
>>   * May have speaker identification text.
>>   * May indicate other relevant sounds textually.
>>   * Don't indicate text that can be seen in the video frame.
>>   * Not rendered by default.
>>   * Enabled by a browser-wide "I am deaf or my device doesn't do sound 
>> out" pref.

It should also, I think, be available on a case-by-case basis. The 
information is potentially useful for everyone, e.g. if a background 
sound or a particular speaker is indistinct to your ears. I don't think 
closed captioning functionality is best buried in an obscure browser 
configuration setting.

>>  2) Subtitles for the people who can't follow foreign-language speech:
>>   * Written in the language of the site that embeds video when there's 
>> speech in another language.
>>   * Don't identify the speaker.
>>   * Don't identify sounds.
>>   * Translate relevant text visible in the video frame.
>>   * Rendered by default.
>>   * As a bonus suppressible via the context menu or something on a 
>> case-by-case basis.

Just to add another complication to the mix, we shouldn't forget the 
need to provide for sign language interpretation. The BBC's iPlayer 
features sign interpretation, FWIW:


> This should all be out of scope, IMHO;  this is about the design of a 
> captioning system, which I don't think we should try to do.

I'm a bit confused about why W3C's Timed Text Candidate Recommendation 
hasn't been mentioned in this thread, especially given that Flash 
objects are the VIDEO element's biggest "competitor" and Flash CS3's 
closed captioning component supports Timed Text. I haven't used it 
myself: is there some hideous disadvantage of Timed Text that makes it 
fundamentally flawed? It is appears to be designed for use both with 
subtitles and captions.

Here's the link for the CR:


Benjamin Hawkes-Lewis

Received on Monday, 8 October 2007 12:52:46 UTC