From WHATWG Wiki
There appears to be consensus among the WCAG Samurai and the current WGAC 2.0 draft that the primary ways of making video with soundtrack accessible are to provide captioning for the deaf and audio description for the blind. (The WCAG 2.0 draft also mentions full-text alternative as an alternative to audio description.)
Presumably, the captioning and audio description need to be “closed” (off-by-default, available on request) as content providers might hesitate presenting captions to those who do hear the soundtrack or audio description to those who already see the video track.
Technically this is timed text presented in sync with the video track.
It is assumed to be in the same language as the main soundtrack. Content-wise, it is expected to mention semantically important non-verbal sounds and identify speakers when the video doesn't make it clear who is talking.
In terms of player app decisions, this track shouldn't be presented by default but it should play if the user has opted in (perhaps via a permanent setting) to showing captioning. Also, if the player app knows that audio output has been turned off either in the app or in the OS, it might make sense to turn on captioning in that case as well.
Video Format Support
CMML has been put forward as the timed text format for Ogg. (How to mark as closed captions?)
3GPP Timed Text aka. MPEG-4 part 17 is the timed text format for MP4. (How to mark as closed captions?)
Closed Audio Description
Technically this is a second sound track presented in sync with the main sound track.
It is assumed to be in the same language as the main soundtrack.
In terms of player app decisions, this track shouldn't be presented by default but it should play if the user has opted in (perhaps via a permanent setting) to playing audio descriptions. Also, if the player app knows that a screen reader is in use, it might make sense to use that as a cue of turning on audio descriptions.
Video Format Support
How to flag a second sound track (Speex?) as closed audio description?
How to flag a second sound track as closed audio description?
Data Placement in the Web Context
Should the above-mentioned tracks be muxed into the main video file (Pro: all tracks travel together; Con: off-by-default tracks take bandwidth)? Or should they be separate HTTP resources (Pro: bandwidth optimization; Con: Web-specific content assembly from many files may not survive downloading to disk, etc.) Also note that separate files makes it easier for a 3rd party to add tracks -- but these tracks may be commentary rather than captions, so this is both a pro and a con.
Related Non-Accessibility Features
There are technically similar non-accessibility (i.e. not related to addressing needs arising from a disability) features related to translation.
A site in language A might want to embed a video with the soundtrack in language B but subtitles in language A. For example, a Finnish-language site embedding an English-language video would want to have Finnish subtitles. Unlike captions, these subtitles should be on by default and being able to suppress the subtitles is considered an additional nice-to-have feature.
There are also same-language subtitles (e.g. French subtitles with French-language soundtrack) for language learners. Unlike captions, same-language subtitles don't inform the reader about non-verbal sounds or identify speakers.
Subtitles need different track metadata so that they can be displayed by default. (Due to concerns about the reliability of subtitling technology, many content providers probably opt to burn the subtitles into the video track as part of the image data, even though this disturbs video compression.)
Alternative Dubbed Sound Tracks
Due to bandwidth concerns, Web content providers will probably opt to provide separate video files for dubbed languages.
These use cases are broken into two categories: Acessibility, and the broader Universality. Due to their nature and similarity to real accessibility use cases, it is sometimes useful to consider possible solutions to accessibility in the context of the more general universality issues as well.
Accessibility Use Cases
Deaf or Hearing Impaired User Viewing a Video
A user who is unable to hear due to physical disability chooses to watch a video. The video is an interview between 2 people, discussing a topic that the user is interested in. The video has been provided with associated closed captions and the user would like to have those turned on so that he may understand speech and other significant sounds within the video.
A Blind User Listening to a Video's Soundtrack
A blind user cannot see the video, but is still able to hear the audio, chooses to listen to the sound track of a video anyway. The video is a "webisode" (A web episode - like a TV episode, but on the web) of a drama series the user enjoys watching. The video conveys some important information visually in the video, which is not made apparent in the main sound track, such as what the characters are doing and where they are. But the video also contains an alternative or complementary track for audio descriptions, which describes significant parts of the video. The user wishes to enable the audio descriptions to more easily comprehend the content of the video.
There are two varations of this use case:
- The computer that the user is using is primarily meant for his own use, and generally not shared with other non-disabled users, and it would be convenient if this was not required to be a manual selection each time.
- The computer is shared between the disabled user and other non-disabled people. The accessibility features used by the disabled user are turned when used by others, and it would be inconvenient if the audio descriptions were automatically selected for them too.
A Deaf and Blind User is Unable to See or Hear the Video
A deaf and blind user cannot see or hear the video, and primarily accesses content using a braille reader. The video is a tutorial and demonstration about how to perform a particular science experiment at home. The user wishes to understand the content of the video so that he can teach the experiment to his nephew. The user needs to read the full text description, which describes and transcribes significant aural and visual content in the video. He does not need the video to be loaded, but instead needs to be provided with easy access to the full text description.
- Variation: If a full-text description or transcript isn't provided but captioning is provided, it should be possible to extract the captions and render them as braille.
Author Embedding Third-Party Media, but Attempting to Keep the Web Page Accessible
From the perspective of the end user, this could be any of the other use cases. The reason to call this case out separately is that there needs to be a way for a web page author to provide accessibility and/or fallback information even when the referenced media itself cannot be modified.
Related Universality Use Cases
Sound Equipment is Unavailable, Muted, or has Low Volume
A user who is unable to hear the audio in the video well because his computer lacks audio equipment, such as a sound card, headphones or speakers; or because the volume needs to be kept down low in the user's environment. The video has been provided with either closed captions or same-language subtitles. Similarly to a hearing impaired user, the user would like to have the ability to turn on captions or subtitles so they may more easily understand what is being said.
A User Wants to Access a Transcript
A user chooses not to watch a video, regardless of whether or not they are physically able. The video is of an election campaign speech and has been published with a transcript of the speech alongside. The user wishes to thoroughly review the speech and compare it with those from the other candidates. He accesses the transcript and prints it out, to enable him to read it and make notes or highlight important sections.
A non-disabled user wants to read a full text description
A video of a slide presentation has been published, but the user wishes to read the full text description instead. The description includes both a transcript of what the presenter said and images of the slides. The user chooses to read all or part of the full text description, instead of watching the whole video.
- A user who is deaf or hearing impaired needs a way to express his preference for captioning so that he is not required to manually enable them each time.
- A user without sound equipment on a particular device may also wish to express the preference for captioning to avoid manual selection each time.
- A user who is only temporarily unable to hear the audio due to low or muted volume needs a way to manually enable closed captions or same-language subtitle tracks on a per-video basis, if raising the volume is not practical.
- A blind user needs a way to express his preference for audio descriptions so that he is not required to manually select the descriptions each time.
- A user needs to have a way to manually enable or disable an alternative or complimentary audio track containing audio descriptions.
- A user, regardless of disability, needs a way to access a video transcript or full text description.
- A user needs a way to prevent their browser from automatically downloading video that they don't want or need.
(to be completed)