- From: <ehansen@ets.org>
- Date: Mon, 29 Nov 1999 18:08:16 -0500 (EST)
- To: w3c-wai-gl@w3.org
- Cc: w3c-wai-ua@w3.org
I came in today resolved to try to tackle some apparent ambiguities surrounding auditory descriptions. I was glad to see the discussion already underway (mostly Marja and Wendy until this point). For now, I am posting this on WCAG with a cross-post to UAAG. THE PROBLEM I think that it is really important to find a way to resolve the ambiguities surrounding auditory descriptions and then see that the resolutions are carried through into the two other sets of guidelines -- Authoring Tool (ATAG) and User Agent (UAAG). Some Definitions First a few definitions and some thoughts on utility: a. Collated text transcript = A collation of a text equivalent of the auditory track and a text equivalent of the visual track, generally in reading order. Utility: Essential for people who are deaf-blind (accessed via braille); helpful for many others. b. Text transcript = Text equivalent of the auditory track. Utility: Can be used to produce captions. c. Captions = Text equivalent of the auditory track, synchronized with the video (and auditory) tracks. Utility: Essential for individuals who are deaf or hard of hearing. d. Auditory description = Auditory equivalent of the visual track that is synchronized with the regular auditory (and visual) tracks, usually inserted in the natural pauses of the spoken dialogue. Utility: Essential (or near-essential) for people who are blind. d.1. Synthesized-speech auditory description = an auditory description produced via synthesized speech, generally generated "on the fly" from a text equivalent of the visual track. d.2. Prerecorded auditory description = an auditory description in prerecorded speech, usually in natural human speech. The Need for Clarification I think that there is a need for WCAG to clarify the concept of auditory description and especially how it is related to the concept of a collated text transcript. It is important to affirm the _Priority 1_ requirement for _synchronization information for the text equivalent of the visual track_ because both the text equivalent and the synchronization information are essential for producing synthesized-speech auditory descriptions. Does WCAG 1.0 _already_ require synchronization of the text equivalent of the visual track of multimedia presentations? Yes. The requirement for such synchronization is found in 1.4: "1.4 For any time-based multimedia presentation (e.g., a movie or animation), <EMPHASIS> SYNCHRONIZE EQUIVALENT ALTERNATIVES </EMPHASIS> (e.g., captions or auditory descriptions of the visual track) with the presentation. [Priority 1] " Yet one of the reasons for the current ambiguity may be the examples used -- "(e.g., captions or auditory descriptions of the visual track)". Nowhere is mentioned the _text equivalent of the visual track_. Nor does it mention the _collated text transcript_ from which that text equivalent could be derived. PROPOSED SOLUTION I think that in terms of changes to WCAG checkpoints, the following changes may address the ambiguities. === 1. Fix WCAG checkpoint 1.4. The following change removes the "e.g.," parenthetical phrase and then provides a note. Old WCAG checkpoint 1.4: "1.4 For any time-based multimedia presentation (e.g., a movie or animation), synchronize equivalent alternatives (e.g., captions or auditory descriptions of the visual track) with the presentation. [Priority 1]" New WCAG checkpoint 1.4: "1.4. For any time-based multimedia presentation (e.g., a movie or animation), synchronize equivalent alternatives with the presentation. [Priority 1] " "Note. For multimedia presentations (e.g., movies and animations), special attention should be given to synchronizing the collated text transcript with its corresponding auditory and visual tracks. By doing so, one may facilitate or even automate provision of other components such as captions, synthesized-speech auditory descriptions, and text transcripts." === 2. Fix WCAG checkpoint 1.3. The following revision to checkpoint 1.3 has several important features, including the following. (1) Provides a better distinction between synthesized-speech auditory description and the prerecorded auditory description. (2) Establishes a lower priority for prerecorded auditory description (from Priority 1 to Priority 2). If a synchronized text equivalent of the visual track is already being provided, then failure to provide a prerecorded auditory description does not render access "impossible" and therefore it need not be Priority 1. (3) Retains the idea that prerecorded auditory descriptions are required only for "important" information of the visual track. Yet notwithstanding that _prerecorded auditory descriptions_ are only required for _important_ visual content in multimedia presentations, it is important to remember that _synchronized text equivalents of the visual track_ that are used to create synthesized-speech auditory equivalents are _always_ required. Old: "1.3 Until user agents can automatically read aloud the text equivalent of a visual track, provide an auditory description of the important information of the visual track of a multimedia presentation. [Priority 1] Synchronize the auditory description with the audio track as per checkpoint 1.4. Refer to checkpoint 1.1 for information about textual equivalents for visual information." New WCAG checkpoint 1.3, showing changes: "1.3 Until user agents <CHANGE> can produce a synthesized-speech auditory description </CHANGE> from a text equivalent of a visual track, provide <CHANGE> a prerecorded auditory description </CHANGE> of the important information of the visual track of a multimedia presentation. [Priority 2] Synchronize the <CHANGE> prerecorded </CHANGE> auditory description with the audio track <CHANGE> [deleted the word "as" (grammatical error)] </CHANGE> per checkpoint 1.4. Refer to checkpoint 1.1 for information about <CHANGE> text [instead of "textual"] </CHANGE> equivalents for <CHANGE> non-text elements [instead of "visual information"] </CHANGE>. New WCAG checkpoint 1.3, cleaned up: "1.3 Until user agents can produce a synthesized-speech auditory description from a text equivalent of a visual track, provide a prerecorded auditory description of the important information of the visual track of a multimedia presentation. [Priority 2] Synchronize the prerecorded auditory description with the audio track per checkpoint 1.4. Refer to checkpoint 1.1 for information about text equivalents for non-text elements. ==== 3. Add a checkpoint (WCAG checkpoint 1.3A). The following revision makes clear that the even after user agents are able to produce synthesized-speech auditory descriptions from text, a prerecorded auditory description may still improve access. By having a distinct checkpoint for prerecorded auditory descriptions that does _not_ have an "until user agents" clause, one can affirm the value of prerecorded auditory descriptions even _after_ they have been rendered partially obsolete by advances in user agent technology. Please note that once user agents are able to generate-synthesized speech auditory, failure to provide a prerecorded auditory description does not render access "impossible" (i.e., it need not be Priority 1) nor does it cause a "significant barrier" (Priority 2); it should therefore be rated a Priority 3. Nevertheless, at Priority 3, I have removed the reference to "important information", i.e., the checkpoint may as well apply more generally to both "important" and "unimportant" information. "1.3A Provide a prerecorded auditory description of the visual track of a multimedia presentation. [Priority 3]" ==== 3. Put checkpoint 1.4 before 1.3 and 1.3A. Checkpoint 1.4 (synchronized alternatives) should come before 1.3 and 1.3A, since 1.4 addresses the general issue of synchronized alternatives and 1.3 and 1.3A address special cases. === 4. Fix the Techniques document. The Techniques should be modified to reflect the new emphasis. For example, ideally, the techniques could point to some approach or technology (e.g., SMIL [?]) that could allow the captions, text transcript, and prerecorded auditory description to be generated automatically from the collated text transcript and its synchronization information. I think that this is a very important change, even though it need not affect the WCAG document itself. Requirements for synchronization standards are a topic with which I am not well-acquainted. === 5. Make other minor adjustments in WCAG. A few other minor adjustments in WCAG might be necessary. For example, one might wish to mention collated text transcripts in a note in checkpoint 1.1. === 6. Make adjustments in the UAAG and ATAG documents. The UAAG and ATAG documents should be carefully examined to ensure that they properly reflect these changes. For example, perhaps user agents and authoring tools should have a Priority 1 requirement for handling both kinds of auditory descriptions -- prerecorded and synthesized speech -- even though prerecorded auditory descriptions would not be a Priority 1 WCAG requirement. Another possible refinement would be to specify either "prerecorded auditory description" or "synthesized-speech auditory description" when it is clear one intends only one of them. ==== ANOTHER ISSUE A note regarding "captions". The UAAG working group is considering referring to "closed captions" where WCAG refers simply to "captions". I think that there ought to be consistency between the documents. I have mixed feelings about that possible change. At this moment, I lean in favor of keeping the word "captions" as it is throughout the three documents. ==== ============================= Eric G. Hansen, Ph.D. Development Scientist Educational Testing Service ETS 12-R Rosedale Road Princeton, NJ 08541 (W) 609-734-5615 (Fax) 609-734-1090 E-mail: ehansen@ets.org
Received on Tuesday, 30 November 1999 16:14:11 UTC