Revised Checkpoints: WCAG(1.4/1.3) and UAAG(2.5) from ehansen@ets.org on 1999-12-05 (w3c-wai-ua@w3.org from October to December 1999)

From: <ehansen@ets.org>
Date: Sun, 05 Dec 1999 01:30:02 -0500 (EST)
To: w3c-wai-gl@w3.org, w3c-wai-ua@w3.org
Message-id: <vines.Bh0E+INUGsA@cips06.ets.org>

Version: 5 December 1999, 01:16 hrs

Revised Checkpoints: WCAG(1.4/1.3) and UAAG(2.5)

PART 1 - INTRODUCTION

This memo follows up on an action item from the User Agent Accessibility Guidelines (UAAG) phone conference of 1 December 1999. I had indicated during that meeting that there were ambiguities on how user agents should make movies, animations, and audio clips accessible to people with disabilities. I expressed the idea that some of the challenges in UAAG could be cleared up if the Web Content Accessibility Guidelines (WCAG) working group could clarify what was intended by WCAG 1.0 checkpoint 1.4. I indicated that I would write up what I thought was intended or should have been intended by WCAG checkpoint 1.4 and how that clarification would impact checkpoints in UAAG, particularly, UAAG checkpoint 2.5. (See below.)

WCAG 1.0 (5 May 1999) checkpoint 1.4:
"1.4 For any time-based multimedia presentation (e.g., a movie or
animation), synchronize equivalent alternatives (e.g., captions or auditory
descriptions of the visual track) with the presentation. [Priority 1]"

WCAG 1.0 (5 May 1999) checkpoint 1.3:
"1.3 Until user agents can automatically read aloud the text equivalent of
a visual track, provide an auditory description of the important
information of the visual track of a multimedia presentation. [Priority 1]
Synchronize the auditory description with the audio track as per checkpoint
1.4. Refer to checkpoint 1.1 for information about textual equivalents for
visual information."

UAAG 1.0 (5 November 1999) checkpoint 2.5:
2.5 Allow the user to specify that continuous equivalent tracks (e.g., closed captions, auditory descriptions, video of sign language, etc.) be rendered at the same time as audio and video tracks. [Priority 1]
Techniques for checkpoint 2.5

Some of the changes that are suggested in this memo were already covered in my memo "Questions about WCAG 1.3 - A possible solution" (sent 29 November but received by most if not all UAAG and WCAG list members on 30 November) [http://lists.w3.org/Archives/Public/w3c-wai-ua/1999OctDec/0483.html] (See also Wendy Chisolm's response [http://lists.w3.org/Archives/Public/w3c-wai-ua/1999OctDec/0484.html]). I will not repeat the rationales for all the suggestions. However, where the two document disagree, this one supersedes that one.
====

Identifying Critical Combinations of Media

I want to just underline my concern that there are things that WAI can and should do to reduce uncertainty about what Web content developers, developers of user agents, and developers of authoring tools need to do make multimedia presentations and audio clips accessible to people with disabilities. I think that there are a huge number of ways in which text, video, audio and their equivalents _could be combined_ to make multimedia presentations and audio clips accessible to people with disabilities, but only a much smaller number of ways are really essential or really valuable and it is up to WAI to more specifically identify and describe that smaller number of combinations.

Without more specificity than is in the UAAG document, I think that it would be hard for developers of user agents to proceed with confidence towards conformance.

====
Implementation Plan

Following is the basic plan underlying my proposed changes.

1. The WCAG working group confirms that the proposed changes do indeed clarify the intent of the WCAG 1.0 checkpoints 1.4 and 1.3.

2. The UAAG working group accepts (as is or with revisions) my revisions (which is generally a clarification of UAAG checkpoint 2.5).

3. The WCAG working group makes the appropriate changes in Errata sheets, techniques, documents, and, appropriate, a later version of the WCAG guidelines.

4. The ATAG working group makes any necessary changes.

5. The UAAG becomes a W3C recommendation.

6. Developers of user agents are required immediately (upon issue of UAAG as a W3C recommendation) to begin supporting (1) captions and (2) what I call a "prerecorded auditory description track" that combines the regular auditory track and the prerecorded auditory descriptions into a single separate auditory track.

7. WAI should develop one or more specification documents (W3C Notes or Recommendations) for:

a. auditory descriptions, including (1) synthesized-speech auditory descriptions and (2) prerecorded auditory descriptions (including "prerecorded auditory description tracks" and "prerecorded auditory description supplement tracks", the latter being explained later in this document)
b. captions
c. synchronization of collated text transcripts
d. synchronization of audio clips with their text transcripts

I see document "c" as possibly encompassing "a" and "b". Even better perhaps, they could all four items could be addressed together. (I am not sure whether all these are within the charter of SMIL.)

I would not expect the task of retrofitting existing "prerecorded auditory descriptions tracks" to the specifications to be difficult. Content and data in existing captions could, I expect, be almost entirely reused in new captions conforming to the captions specification.

8. Developers of user agents would become responsible for conformance to any given specifications one year after their release.

====
PART 2 - DEFINITIONS AND USAGE ISSUES

The following refer to the UAAG but may be relevant to others as well.

1. Add a definition of "audio clip".

This definition is needed to support the definitions in this memo.

Old: None

New:
"Audio clip"
"As used in this document, the term "audio clip" refers to any auditory presentation for which for which the Web Content Accessibility Guidelines requires a text equivalent, except for auditory tracks of movies or animations. For example, the term covers standalone audio files and sounds (played with or without user interaction), and streaming audio {???} (see WCAG checkpoint 1.1)."

Is this accurate? Too narrow or broad? That would include streaming audio (songs, etc.)

====
2. Avoid the use of "synchronized alternative equivalents" in WCAG.

The term seems redundant.

3. Avoid the use of "synchronized equivalents" in both WCAG and UAAG.

This is important because often the components to that are presented together are not equivalent to each other. The term seems misleading.
====

4. Use the term "synchronized alternatives".

Implies the idea that it is alternative content, which is essentially true. This is my preferred term, I think.
====
5. Use "visual track" and "auditory track"

Use "visual track" and "auditory track" rather than video track and audio track when referring to multimedia presentations.

====

6. Avoid the term "continuous alternatives".

Not sure that this is a great term. It is probably best just to name the specific things.

====
7. Add synchronization to the glossary.

[This definition addresses what I meant by synchronization in the one{?} checkpoint that I refer to it. (I have actually reduced reliance on the word). My major suggestions (revisions of checkpoints) do not depend heavily on acceptance of this definition.]

"Synchronization, Synchronize, Synchronization Data, Synchronized Alternatives"

"Synchronization refers to sensible time-coordination of two or more presentation components, particularly where at least one of the components is a multimedia presentation (e.g., movie or animation) or _audio clip_ or a portion of the presentation or audio clip."

"For Web content developers, the requirement to synchronize means to provide the data that will permit sensible time-coordinated presentation by a user agent. For example, Web content developer can ensure that the segments of caption text are neither too long nor too short and that they mapped to segments of the visual track that are appropriate in length.

For developers of user agents, the requirement to synchronize means to present the content in a sensible time-coordinated fashion under a wide range of circumstances including technology constraints (e.g., small text-only displays), user limitations (slow reading speeds, large font sizes, high need for review or repeat functions), and content that is sub-optimal in terms of accessibility."

"The idea of "sensible time-coordination" of components centers of the idea of simultaneity of presentation, but also encompasses strategies for handling deviations from simultaneity resulting from a variety of causes.

Consider how certain deviations in simultaneity might be handled in auditory descriptions. Auditory descriptions are considered synchronized, since each segment of description audio is presented at the same time as a segment of the auditory track, e.g., a natural pause in the spoken dialogue. Yet a deviation can arise when a segment of the auditory description is lengthy enough that it cannot be entirely spoken within the natural pause. In this case there must be a strategy for dealing with the mismatch between the description and the pause in the auditory track. The two major types of auditory descriptions lend themselves to different strategies. Prerecorded auditory descriptions usually deal with such mismatches by spreading the lengthy auditory description over more than one natural pause. When expertly done, this strategy does not ordinarily weaken the effectiveness of the overall presentation. On the other hand, a synthesized-speech auditory description lends itself to ot!
her strate
gies. Since synthesize

Let us briefly consider how deviations might be handled for captions.

Captions consist of a text equivalent of the auditory track that is synchronized with the visual track. Captions are essential for individuals who require an alternative way of accessing the meaning of audio, such as individuals who are deaf. Typically, a segment of the caption text appears visually near the video for several second while the person reads the text. As the visual track continues, a new segment of the caption text is presented.

One problem arises if the caption text is longer than can fit in the display space. This can be particularly difficult if due to a visual disability, the font size has been enlarged, thus reducing the amount of caption text that can be presented. The user agent must respond sensibly to such problems, such as by ensuring that the user has the opportunity to navigate (e.g., scroll down or page down) through the caption segment before proceeding with the visual presentation and presenting the next segment. Some means must be provided to allow the user to signal that the presentation may resume.

=====
PART 3 -- CHANGES TO WCAG DOCUMENT

1. Add checkpoints 1.3 into checkpoint 1.4 and then break 1.4 into several checkpoints.

Old:

Please note that the checkpoint calls for an "auditory description of the IMPORTANT information". This leaves several points slightly or very ambiguous. For example, once the "Until user agents.." clause is satisfied:

1. Will there be NO REQUIREMENT for auditory descriptions? I think that it is clear that the working group intended to retain a requirement for auditory descriptions.
2. Will the requirement for auditory descriptions apply only to visual tracks that contain _important information_ or to _all_ visual tracks of multimedia presentations? I don't think that the answer is clear to this one. Presumably, once auditory descriptions can be generated from the text equivalent, it would be very easy to do and therefore, one could DROP the reference to "IMPORTANT" information. But that is easy for me to say and I am not sure of all the long-term consequences of formalizing that interpretation. In the list of "New" checkpoints, I refer again to consequences of dropping or retaining reference to "important information."

Old WCAG checkpoint 1.4 (5 November 1999):
"1.4 For any time-based multimedia presentation (e.g., a movie or
animation), synchronize equivalent alternatives (e.g., captions or auditory
descriptions of the visual track) with the presentation. [Priority 1]"

Recent Suggestion, WCAG checkpoint 1.4 (30 November 1999 Suggestion):
"1.4. For any time-based multimedia presentation (e.g., a movie or
animation), synchronize equivalent alternatives with the presentation.
[Priority 1] "
"Note. For multimedia presentations (e.g., movies and animations), special
attention should be given to synchronizing the collated text transcript
with its corresponding auditory and visual tracks. By doing so, one may
facilitate or even automate provision of other components such as captions,
synthesized-speech auditory descriptions, and text transcripts."

This suggestion is superseded by the material below.
==
Note. Checkpoint 1.4 and 1.3 are replaced by checkpoints 1.4.A through 1.4.F.

New WCAG checkpoint 1.4.A (4 December 1999):
"1.4.A Until user agents can produce synthesized-speech auditory descriptions, provide an auditory description of _important information_ for each multimedia presentation (e.g., movie or animation). [Priority 1]"

Note that this is essentially a refined version of checkpoint 1.3. It doesn't specify what kind of auditory description (synthesized-speech or prerecorded), but while this checkpoint is in force, it will generally the latter.

New WCAG checkpoint 1.4.B (4 December 1999) (id: WC-SSAD):
"1.4.B For each multimedia presentation, provide data that will produce a synthesized-speech auditory description. [Priority 1]"
"Note: This checkpoint becomes effective one year after the release of a W3C specification for synthesized-speech auditory descriptions."

Rationale: I think that it doesn't make sense to require synchronization data before a W3C standard is established. Notice that the checkpoint refers to "data" rather than just "synchronization" data. Both the text equivalent "data" and the synchronization data are needed.

This check is very important not only because it affirms the necessity of synthesized-speech auditory descriptions but also because it affirms that auditory descriptions will be required AT ALL after the "Until user agents" condition is satisfied.

This checkpoint is extremely important to solidify soon. It is also critical to develop the specification for synthesized-speech auditory descriptions, since developers of user agents might not implement the feature unless it is required by W3C.
==

An important alternative:

It is important to determine whether auditory descriptions of the future (i.e., those after the "Until user agents" clause is satisfied) will be required only for _important information_ of a visual track or simply for the whole track.

New WCAG checkpoint 1.4.B-RETAIN (4 December 1999) (id: WC-SSAD-IMPINF):
"1.4.B-RETAIN For each multimedia presentation, provide data that will produce a synthesized-speech auditory description of _important_ information for the visual track. [Priority 1]"
"Note: This checkpoint becomes effective one year after the release of a W3C specification for synthesized-speech auditory descriptions."

Rationale: Same rationale as for 1.4.B, except that this indicates that the auditory description is necessary only for "important" information.

====
New WCAG checkpoint 1.4.C (4 December 1999):
"1.4.C For each multimedia presentation (e.g., movie or animation), provide captions and a collated text transcript. [Priority 1]"

Rationale: These two pieces are essential (captions for individuals who are deaf; collated text transcript for individuals who are deaf-blind). We know that captions are needed and we have technologies that can handle it. A collated text transcript is relatively straightforward to supply.

Note that this version (1.4.C) REMOVES reference to auditory descriptions.

The assumption underlying this alternative is that the qualification regarding "important" information will be DROPPED once user agents are able to create synthesized-speech auditory descriptions. (An opposite assumption is addressed later in this document.)

====
New WCAG checkpoint 1.4.D (4 December 1999) (id: WC-ACLIP-TT):
"1.4.D For each audio clip, provide a text transcript. [Priority 1]"

Rationale: A text transcript is _essential_ for disability access to audio clips, whereas a text transcript is not essential for access to auditory tracks of multimedia presentations (for example, the collated text transcript and caption text includes the information found in the text transcript of the auditory track).
====

New WCAG checkpoint 1.4.E (4 December 1999) (id: WC-ACLIP-SYNC-TT):
"1.4.E Synchronize each audio clip with its text transcript. [Priority 1]" {I prefer the brevity of this version.}
{or}
"1.4.E For each audio clip, provide data that will allow user agents to synchronize the audio clip with the text transcript. [Priority 1]"
"Note: This checkpoint becomes effective one year after the release of a W3C recommendation addressing the synchronization of audio clips with their text transcripts."

Rationale: Synchronization between the audio clip and the text transcript is essential or near-essential for many individuals who are deaf or hard of hearing but who have some residual hearing because it will allow them to match what they _are_ able to hear with the text transcript. I could also argue for this being Priority 2 rather than Priority 1, though Priority 1 is probably fine and in keeping with existing (5 May) checkpoint. By the way, I use the term "audio clip" rather than "auditory track" because I reserve the latter for multimedia presentations.
==

New WCAG checkpoint 1.4.F
"For each multimedia presentation for which a synthesized-speech auditory description of _important_ information is likely to be inaccessible, provide a prerecorded auditory description _important_ information."
"[Priority 3]"
{or}
"For each multimedia presentation, provide a prerecorded auditory description."
"[Priority 3]"
{or}
"For each multimedia presentation, provide a prerecorded auditory description for _important_ information."
"[Priority 3]"

This revision indicates that even after user agents are able to produce synthesized-speech auditory descriptions from text, a prerecorded auditory description may still improve access. By having a distinct checkpoint for prerecorded auditory descriptions that does _not_ have an "until user agents" clause, one can affirm the value of prerecorded auditory descriptions even _after_ they have been rendered partially obsolete by advances in user agent technology. Please note that once user agents are able to generate-synthesized speech auditory description, failure to provide a prerecorded auditory description does not render access "impossible" (i.e., it need not be Priority 1) nor does it cause a "significant barrier" (Priority 2); it should therefore be rated a Priority 3. I suppose that one could drop the checkpoint altogether, since that seems allowable by one interpretation of checkpoint 1.3.

====
PART 4 - CHANGES FOR UAAG

1. Fix UAAG checkpoint 2.5.

The proposed fix addresses a variety of issues, some of which were discussed in additional detail in earlier memos. For example, I gave a rationale against the use of the term "continuous equivalent track".

Even though these changes increase the size of the guidelines, I think that it will clarify things considerably for developers of user agents.

Old UAAG 1.0 checkpoint 2.5 (5 November 1999)

"2.5 Allow the user to specify that continuous equivalent tracks (e.g., closed captions, auditory descriptions, video of sign language, etc.) be rendered at the same time as audio and video tracks. [Priority 1]"
"Techniques for checkpoint 2.5"

New UAAG checkpoint 2.5.A (id: CCT)
"For any multimedia presentation, allow the user to display (or view} the collated text transcript [Priority 1]."

Rationale: A collated text transcript is essential for individuals who are deaf-blind and has many other uses. It is intended that this checkpoint can be met even by users agents that are not multimedia capable (braille output devices, mobile devices, etc.). One reason I have given the collated text transcript its own checkpoint is that we want to avoid excessive numbers of checkpoints not being applicable simply because the user agent is incapable of meeting some small part of a checkpoint (mobile devices, etc.). Also user agent developers get no credit for partial adherence to a checkpoint. They only have all-or-nothing adherence to individual applicable checkpoints. Here is one that should be applicable and do-able by "every" user agent.

A side note: It seemed to make sense to begin the revised WCAG checkpoints with "For each³" and the revised UAAG checkpoints with "For any³".

====

New UAAG checkpoint 2.5.B (id: ACLIP-TT)
"For any audio clip, allow the user to display (or view} the text transcript [Priority 1]."

Rationale: A text transcript is _essential_ for disability access to audio clips, whereas it is _not_ essential for access to auditory tracks of multimedia presentations (for example, the collated text transcript includes the information found in the text transcript of the auditory track). (See definition of "audio clip" earlier in this document.

====
New UAAG checkpoint 2.5.C (id: MMCOMBIN-AV-CAP-PADT)
"For any multimedia presentation, allow the user to display {or view} any of the following combinations:"
"(1) Visual track + auditory track"
Groups: People with disabilities that do not prevent access to the visual track and auditory track.
"(2) Visual track + auditory track + captions"
Groups: Deaf or hard of hearing
"(3) Visual track + prerecorded auditory description track"
Groups: Blind or low vision, some with learning disabilities or cognitive disabilities.
"[Priority 1]"
"Note 1: These combinations must be presentable with or without the collated text transcript already required in checkpoint 2.5.A."
"Note 2: Per checkpoints [LIST THEM], the user agent must be able to control (e.g., turn off and on) any of the components within a numbered combination."

Note that this checkpoint makes the _assumption_ that the accepted and standard way of providing prerecorded auditory description should be as a separate and complete auditory track (as a "prerecorded auditory description supplement track" as opposed to a "prerecorded auditory description supplement track" [see below]). This point deserves some discussion.

====
New UAAG checkpoint 2.5.D (id: MM-COMBIN-SSAD)
"For any multimedia presentation, allow the user to display (or view} the following additional combination:"
"(4) Visual track + auditory track + synthesized-speech auditory description"
Groups: Blind or low vision, some individuals with learning disabilities or cognitive disabilities. (Same list as for combination 3).
"[Priority 1]"
"Note 1: This combination must be presentable with or without the collated text transcript already required in checkpoint 2.5.A."
"Note 2: Per checkpoints [LIST THEM], the user agent must be able to control (e.g., turn off and on) any of the components within a numbered combination."
"Note 3: This checkpoint becomes effective one year after the release of a W3C recommendation addressing synthesized-speech auditory descriptions."

Rationale: Synthesized-speech auditory descriptions is a very high priority but we cannot require it until a spec for it is released.
====
New UAAG checkpoint 2.5.E (id: MM-COMBIN-PADST)

"For any multimedia presentation, allow the user to display {or view} the following additional combination:"

"(5) Visual track + auditory track + prerecorded auditory description supplement track"
Groups: Blind or low vision, some with learning disabilities or cognitive disabilities. (Same list as for combination 3).
"[Priority 2]"
"Note 1: This combination must be presentable with or without the collated text transcript already required in checkpoint 2.5.A."
"Note 2: Per checkpoints [LIST THEM], the user agent must be able to control (e.g., turn off and on) any of the components within a numbered combination."
"Note 3: This checkpoint becomes effective one year after the release of a W3C recommendation addressing the creation of prerecorded auditory description supplement tracks."

Rationale: A prerecorded auditory description supplement track could be important because it could save download time and allow greater flexibility in presentation than the (ordinary) prerecorded auditory description track that combines the regular auditory material and the auditory descriptions to create an integrated audio track.

In essence, the use of a prerecorded auditory description supplement track allows the user to obtain high quality prerecorded audio while gaining some of the flexibility of "on the fly" (synthesized-speech) auditory descriptions.

Since this is the first specific mention of "prerecorded auditory description supplement track", this obviously deserves some discussion, though I think that its existence is implied in WCAG 1.0.
====
2. Other checkpoints

New 1. Provide additional combinations.

"For any multimedia presentation, allow the user to present additional combinations of the visual track, regular auditory track, captions, prerecorded auditory description track, prerecorded auditory description supplement track, collated text transcript, etc. [Priority 3]"

Rationale: I think that the really important ones have already been covered. If not, I invite others to say so. It also seems to me that if someone is providing ASL translation on a video, other existing checkpoints will cover it.
====
New 2. Allow use of the text transcript to navigate an audio clip.

"For any audio clip, allow the user to navigate the presentation using the text transcript. [Priority 2]"

"For example, allow a user to jump to a point within the auditory presentation by selecting a specific segment of the text transcript."
====
New 3. Allow navigation via collated text transcript

"For any multimedia presentation, allow the user to navigate the presentation and its related component {or "synchronized alternatives" or "alternatives"} using the collated text transcript. [Priority 2]"

"For example, allow users to employ the collated text transcript (or subsets of it) to control navigation to and within other components (visual track, auditory track, captions, prerecorded auditory description track, prerecorded auditory description supplement track, etc.)"

Rationale: I thought to make this a Priority 3 but now I think that it is more important. I think of this feature as very valuable but not necessarily essential. Is there any chance that this checkpoint is already implied in other checkpoints and that it may already be a Priority 1?

The term "text transcript" for multimedia presentations is not included because it is somewhat encompassed by captions.
====
New 4. Provide user control over the handling of deviations and errors

"Provide user control over the handling of deviations and errors [Priority 2]"

The user may want have the visual track proceed even if the part of the caption text is out of view.

Possible problems:
a. If the visual track or the auditory track was slowed, speeded up, halted due to network problems.
b. captions were too long to fit in the caption window or exceeded a preset presentation rate
c. Synchronization data was missing, incomplete, or invalid.
====
New 5. Provide a report to users regarding what deviations and errors

"Provide a report to users regarding what deviations and errors of synchronization, how they were handled, and summarizing the consequences.
[Priority 2 {or 3}]"

"For example: If the visual track had a regular play time of 60 seconds long but took 80 seconds to present because of auditory descriptions were 20 seconds longer than the natural pauses."

Received on Sunday, 5 December 1999 01:29:57 UTC