Re: Questions about WCAG 1.3 - A possible solution from ehansen@ets.org on 1999-11-29 (w3c-wai-gl@w3.org from October to December 1999)

From: <ehansen@ets.org>
Date: Mon, 29 Nov 1999 18:08:16 -0500 (EST)
To: w3c-wai-gl@w3.org
Cc: w3c-wai-ua@w3.org
Message-id: <vines.Bh0E+URkEsA@cips06.ets.org>
I came in today resolved to try to tackle some apparent ambiguities 
surrounding auditory descriptions. I was glad to see the discussion already 
underway (mostly Marja and Wendy until this point). For now, I am posting 
this on WCAG with a cross-post to UAAG. 

THE PROBLEM

I think that it is really important to find a way to resolve the 
ambiguities surrounding auditory descriptions and then see that the 
resolutions are carried through into the two other sets of guidelines -- 
Authoring Tool (ATAG) and User Agent (UAAG). 

Some Definitions

First a few definitions and some thoughts on utility:

a. Collated text transcript = A collation of a text equivalent of the 
auditory track and a text equivalent of the visual track, generally in 
reading order. Utility: Essential for people who are deaf-blind (accessed 
via braille); helpful for many others.
b. Text transcript = Text equivalent of the auditory track. Utility: Can be 
used to produce captions.
c. Captions = Text equivalent of the auditory track, synchronized with the 
video (and auditory) tracks. Utility: Essential for individuals who are 
deaf or hard of hearing.
d. Auditory description = Auditory equivalent of the visual track that is 
synchronized with the regular auditory (and visual) tracks, usually 
inserted in the natural pauses of the spoken dialogue. Utility: Essential 
(or near-essential) for people who are blind.
d.1. Synthesized-speech auditory description = an auditory description 
produced via synthesized speech, generally generated "on the fly" from a 
text equivalent of the visual track.
d.2. Prerecorded auditory description = an auditory description in 
prerecorded speech, usually in natural human speech.

The Need for Clarification

I think that there is a need for WCAG to clarify the concept of auditory 
description and especially how it is related to the concept of a collated 
text transcript.

It is important to affirm the _Priority 1_ requirement for _synchronization 
information for the text equivalent of the visual track_ because both the 
text equivalent and the synchronization information are essential for 
producing synthesized-speech auditory descriptions. 

Does WCAG 1.0 _already_ require synchronization of the text equivalent of 
the visual track of multimedia presentations? 

Yes. The requirement for such synchronization is found in 1.4:
"1.4 For any time-based multimedia presentation (e.g., a movie or 
animation), <EMPHASIS> SYNCHRONIZE EQUIVALENT ALTERNATIVES </EMPHASIS>  
(e.g., captions or auditory descriptions of the visual track) with the 
presentation. [Priority 1] "

Yet one of the reasons for the current ambiguity may be the examples used  
-- "(e.g., captions or auditory descriptions of the visual track)". Nowhere 
is mentioned the _text equivalent of the visual track_. Nor does it mention 
the _collated text transcript_ from which that text equivalent could be 
derived.

PROPOSED SOLUTION

I think that in terms of changes to WCAG checkpoints, the following changes 
may address the ambiguities.
===
1. Fix WCAG checkpoint 1.4.

The following change removes the "e.g.," parenthetical phrase and then 
provides a note. 

Old WCAG checkpoint 1.4:

"1.4 For any time-based multimedia presentation (e.g., a movie or 
animation), synchronize equivalent alternatives (e.g., captions or auditory 
descriptions of the visual track) with the presentation. [Priority 1]"
 
New WCAG checkpoint 1.4:
"1.4.  For any time-based multimedia presentation (e.g., a movie or 
animation), synchronize equivalent alternatives with the presentation. 
[Priority 1] "
"Note. For multimedia presentations (e.g., movies and animations), special 
attention should be given to synchronizing the collated text transcript 
with its corresponding auditory and visual tracks. By doing so, one may 
facilitate or even automate provision of other components such as captions, 
synthesized-speech auditory descriptions, and text transcripts."
===
2. Fix WCAG checkpoint 1.3.

The following revision to checkpoint 1.3 has several important features, 
including the following.

(1) Provides a better distinction between synthesized-speech auditory 
description and the prerecorded auditory description.
(2) Establishes a lower priority for prerecorded auditory description (from 
Priority 1 to Priority 2). If a synchronized text equivalent of the visual 
track is already being provided, then failure to provide a prerecorded 
auditory description does not render access "impossible" and therefore it 
need not be Priority 1.
(3) Retains the idea that prerecorded auditory descriptions are required 
only for "important" information of the visual track. Yet notwithstanding 
that _prerecorded auditory descriptions_ are only required for _important_ 
visual content in multimedia presentations, it is important to remember 
that _synchronized text equivalents of the visual track_ that are used to 
create synthesized-speech auditory equivalents are _always_ required.

Old:

"1.3 Until user agents can automatically read aloud the text equivalent of 
a visual track, provide an auditory description of the important 
information of the visual track of a multimedia presentation. [Priority 1] 
Synchronize the auditory description with the audio track as per checkpoint 
1.4. Refer to checkpoint 1.1 for information about textual equivalents for 
visual information."


New WCAG checkpoint 1.3, showing changes:

"1.3 Until user agents <CHANGE> can produce a synthesized-speech auditory 
description </CHANGE> from a text equivalent of a visual track, provide  
<CHANGE> a prerecorded auditory description </CHANGE> of the important 
information of the visual track of a multimedia presentation. [Priority 2] 
Synchronize the <CHANGE> prerecorded </CHANGE> auditory description with 
the audio track <CHANGE> [deleted the word "as" (grammatical error)] 
</CHANGE> per checkpoint 1.4. Refer to checkpoint 1.1 for information about 
<CHANGE> text [instead of "textual"] </CHANGE>  equivalents for <CHANGE> 
non-text elements [instead of "visual information"] </CHANGE>.

New WCAG checkpoint 1.3, cleaned up:

"1.3 Until user agents can produce a synthesized-speech auditory 
description from a text equivalent of a visual track, provide a prerecorded 
auditory description of the important information of the visual track of a 
multimedia presentation. [Priority 2] Synchronize the prerecorded auditory 
description with the audio track per checkpoint 1.4. Refer to checkpoint 
1.1 for information about text equivalents for non-text elements.

====
3. Add a checkpoint (WCAG checkpoint 1.3A).

The following revision makes clear that the even after user agents are able 
to produce synthesized-speech auditory descriptions from text, a 
prerecorded auditory description may still improve access. By having a 
distinct checkpoint for prerecorded auditory descriptions that does _not_ 
have an "until user agents" clause, one can affirm the value of prerecorded 
auditory descriptions even _after_ they have been rendered partially 
obsolete by advances in user agent technology. Please note that once user 
agents are able to generate-synthesized speech auditory, failure to provide 
a prerecorded auditory description does not render access "impossible" 
(i.e., it need not be Priority 1) nor does it cause a "significant barrier" 
(Priority 2); it should therefore be rated a Priority 3. Nevertheless, at 
Priority 3, I have removed the reference to "important information", i.e., 
the checkpoint may as well apply more generally to both "important" and 
"unimportant" information.

"1.3A Provide a prerecorded auditory description of the visual track of a 
multimedia presentation. [Priority 3]"

====
3. Put checkpoint 1.4 before 1.3 and 1.3A.

Checkpoint 1.4 (synchronized alternatives) should come before 1.3 and 1.3A, 
since 1.4 addresses the general issue of synchronized alternatives and 1.3 
and 1.3A address special cases.
===
4. Fix the Techniques document.

The Techniques should be modified to reflect the new emphasis. For example, 
ideally, the techniques could point to some approach or technology (e.g., 
SMIL [?]) that could allow the captions, text transcript, and prerecorded 
auditory description to be generated automatically from the collated text 
transcript and its synchronization information. I think that this is a very 
important change, even though it need not affect the WCAG document itself. 
Requirements for synchronization standards are a topic with which I am not 
well-acquainted.

===
5. Make other minor adjustments in WCAG.

A few other minor adjustments in WCAG might be necessary. For example, one 
might wish to mention collated text transcripts in a note in checkpoint 
1.1.
===
6. Make adjustments in the UAAG and ATAG documents.

The UAAG and ATAG documents should be carefully examined to ensure that 
they properly reflect these changes.

For example, perhaps user agents and authoring tools should have a Priority 
1 requirement for handling both kinds of auditory descriptions -- 
prerecorded and synthesized speech -- even though prerecorded auditory 
descriptions would not be a Priority 1 WCAG requirement.

Another possible refinement would be to specify either "prerecorded 
auditory description" or "synthesized-speech auditory description" when it 
is clear one intends only one of them.
====
ANOTHER ISSUE

A note regarding "captions".

The UAAG working group is considering referring to "closed captions" where 
WCAG refers simply to "captions". I think that there ought to be 
consistency between the documents. I have mixed feelings about that 
possible change. At this moment, I lean in favor of keeping the word 
"captions" as it is throughout the three documents.
====


=============================
Eric G. Hansen, Ph.D.
Development Scientist
Educational Testing Service
ETS 12-R
Rosedale Road
Princeton, NJ 08541
(W) 609-734-5615
(Fax) 609-734-1090
E-mail: ehansen@ets.org
Received on Tuesday, 30 November 1999 16:14:11 UTC