- From: Joe Clark <joeclark@joeclark.org>
- Date: Mon, 25 Nov 2002 17:44:14 -0500
- To: WAI-GL <w3c-wai-gl@w3.org>
The teleconference minutes
<http://www.w3.org/WAI/GL/2002/11/14-minutes.html> say:
>pb i don't need captions to understand video content, but i have
>watched foreign movies and read the captions.
Subtitles, you mean. Subtitles are not captions. Subtitles are not an
accessibility feature *for people with disabilities*.
>pb generally, i can read captions and not have loss of
>understanding. even the cooking demo - is it necessary?
>
>pb i feel i could read captions and understand what was going on.
Well, here we go with the hypotheticals again.
The teleconference minutes list the participants. Could persons on
that list who watch captions every day please identify themselves?
How about those who have watched two hours of captions in the past
week? And have done the same for every week in the past two months?
Ah. I see.
As in so many other cases in the development of the WCAG,
accessibility advocates errantly behave as if they are equally expert
on all topics. Expertise can be cultivated-- I didn't know anything
about Web accessibility five years ago, and now I know enough to have
written a book about it. But the fact that one has an interest in
accessibility, or work at a university research centre, or have a day
job working on one specific aspect of accessibility, or are
self-taught in one or other of the various issues, does not in itself
qualify you to write guidelines on *a separate topic you don't
actually know anything about*. (You can advance opinions; they are
merely underinformed opinions. But writing actual guidelines requires
a higher threshold.)
The claim above-- "I feel I could read captions and understand what
was going on"-- is hypothetical. One person, using a mental example
rather than lived experience, speculates that an example would be OK
under certain vaguely-envisaged conditions. Captioning is all about
details, not vagueness. And from this and other ill-informed
contributions, an entire Web Content Accessibility Guideline may be
written with repercussions that will remain in effect for up to a
decade.
Shall we continue?
>gv for cooking shows, etc not usually a problem. only during a
>detailed training.
I watch a lot of cooking shows-- _Martha Stewart Living_, _New
Classics_, _Cook Like a Chef_, _Iron Chef_, _Nigella Bites_, but none
of that Jamie Oliver nonsense. Of those that are captioned, I have no
trouble watching the captions and following the show. I doubt anyone
else does, either. Wasn't the first open-captioned show on an
American network actually _The French Chef_? Do we not, in fact, have
enormous experience watching cooking shows and captions
simultaneously?
Is this not the worst possible example, and not merely a hypothetical
one? Does this example not prove the absurdity of the requirement?
>gv usually, if you click on them you can pause them.
>
>gv in a videotape you can pause.
You can't pause the captions independent of the video. Same with DVD.
Isn't that the airy-fairy goal the Initiative is grasping at?
>pb before getting into live streams, since that is a bit diff
>concept, is there evidence that it is better to have simultaneous
>caption and demo, or demo then caption.
You are essentially asking for a new medium to be developed, one that
brings the 19th-century usage of intertitles into the 21st century.
The goal here is apparently to force filmmakers to create segmented
animated slideshows that leapfrog caption tracks or that can be run
in tandem with captions only if you opt into such a display.
Could somebody out there give me five present-day examples of such a
cinematic form? Not hypothetical examples. I mean real products I
could buy and watch, or URLs I could surf to and watch. Five examples
of an actual film, video, TV show, or something else cinematic that
either (a) actually shows pausable captions and video, and is
expressly filmed and created for both of those, or (b) could be very
well adapted to such a style?
Anybody?
No?
>action pb do some follow-up research to answer question about
>producing captions (check with ncam, gallaudet, etc.) re: checkpoint
>1.2
NCAM are not the only experts, and neither is Gallaudet. I would be
careful about calling the usual suspects in such a case.
WAI is pretty much saying that, irrespective of over 20 years of
day-to-day usage of captions by hundreds of thousands of people,
present-day captioning does not work. Captioning viewers, unbeknownst
to themselves, are too stupid to be able to keep up with a picture
and words presented simultaneously.
Users of this accessibility technique have actually been inaccessible
all along! Fools! When will the world *learn*?
Now, could this be projection?
Isn't this the typical reaction of caption-naive hearing people when
presented with captioning for the first time? Especially if they're
over 40? "Oh, my God! You can't expect me to keep up with all that!"
Or is there a kind of overcompensation in order? Since Web
accessibility *mostly* means accessibility for the blind and
visually-impaired (the group that needs the biggest accommodation),
are people so conditioned to the needs of people who cannot see well
that they forget that other people can see just fine? that they have
perfectly adequate capacities for visual processing? that they have
many years of experience watching a moving picture and captions
simultaneously, with no significant problems whatsoever?
I believe there is an undercurrent here of accommodating people with
learning disabilities. As we saw before with the (now discredited)
plan to force all Web authors everywhere to add illustrations
("non-text content") to every single page, allegedly because a few
hypothetical dyslexics might be slightly less confused, there seems
to be a goal of destroying an existing medium (that would be cinema)
and one of its existing accessibility provisions (captioning) because
it is claimed that some learning-disabled people cannot keep up.
Where's the evidence of the problem?
Where is the evidence that the proposed solution will actually cure
the disease and not kill the patient?
Who is the primary audience for captioning? Is it deaf and
hard-of-hearing people or is it learning-disabled people?
Is the Web Accessibility Initiative really sure of itself here? Are
you very sure indeed that you want to destroy existing captioning
methods that *work* for the main disability group for which they are
intended in the half-baked desire to accommodate some hypothesized
subsection of an unrelated disability group?
Now maybe the claim will be made that the people we're trying to
accommodate are both deaf and learning-disabled. ("Whoops! Did we
forget to mention that?") Well, I'm gonna ask again: Where's the
evidence of the problem? Where is the evidence that the proposed
solution will actually fix the problem?
>pb to address my own concern, i will do some research. perhaps more
>appropriate to change the example.
>
>gv the example is definitely a problem.
>
>pb along w/the changed example, supply an example where this would not apply.
Let's go all the way. Let's cook us up a whole raft of examples.
Can an advocate of the proposed checkpoint tell me how it would apply
to the following cinematic forms, all of which I have watched with
captioning in the last month? I'm just going to assume that every
single cooking show qualifies. I'm just going to assume that. Let's
look at some other genres.
* Dramatic feature film
* Animated comedy
* Dramatic TV series
* Newscast
* Talk show
* Music video
* Documentary
* French-language film with English subtitles (yes, *and* captions)
* Porn (not the kind I particularly like, but I've channel-surfed
through it on the Movie Network)
* Infomercial
Could it possibly be true that the checkpoint addresses a
hypothetical problem for a hypothetical user base that isn't even the
primary audience for captioning and can be defended only through the
use of a hypothetical example?
Should I also mention that few existing online media players let you
independently pause and run captions and video? In fact, I don't know
of any-- at all. Perhaps Andrew W.K. knows of one. The point is
nonetheless made: The function that this checkpoint demands does not
presently exist-- for the simple reason that it is not needed. It is
contrary to the way captions are actually used, as anyone who really
uses captioning will confirm.
>db last week i did some research about the cognitive load. i found
>some that spoke to the amount of time someone's eyes might spend
>reading the captions.
>
>db i think it pretty clearly showed that captions take the majority
>of the time.
Yes? And?
You put captions on a video piece and people spend time reading them.
Astounding, isn't it?
But to the WAI, that isn't an ineluctable fact of the captioning
medium. It's a problem that needs fixing.
Really, requiring video to be playable with paused captions or
vice-versa is a lot like some producer calling up a captioning house
and saying "I need to have this program captioned, but can you make
it so there aren't any words on the picture? Because I don't really
like that."
I thought I'd take a look at the actual WCAG 2.0 recommendation
<http://www.w3.org/TR/WCAG20/>. It's rather appalling, frankly.
>Checkpoint 1.2 Provide synchronized media equivalents for time-dependent
>presentations.
>
>Success criteria
>
>You will have successfully met Checkpoint 1.2 at the Minimum Level if:
>
> 1. an audio description is provided of all visual information in
> scenes, actions and events (that can't be perceived from the sound
> track).
No. "That can't be *understood* from the soundtrack" (one word, not
two). An unexplained bump may be audible, but its meaning may not be
clear. Or a character may talk to person X rather than person Y,
again unnoticeable through audio alone.
> + The audio description should include all significant visual
> information in scenes, actions and events (that can't be
> perceived from the sound track) to the extent possible given
> the constraints posed by the existing audio track (and
> constraints on freezing the audio/visual program to insert
> additional auditory description).
I don't get the last part. If you refer to the clumsily-named
E-description concept <http://ncam.wgbh.org/edescription/>, where the
viewer can pause the video to hear a very lengthy ("extended," or
"E") description that provides much more information than an
"interlude" description added during a pause in dialogue or other
appropriate real-time moment, then the last bit doesn't make any
sense. What's the "constraint"? Pausing the video to listen to the
extended description *removes* constraints.
> 2. all significant dialogue and sounds are captioned
> exception: if the Web content is real-time audio-only,
Audio-only feeds are not a "time-dependent presentation" according to
the definition:
>> A time-dependent presentation is a presentation which
>> * is composed of synchronized audio and visual tracks (e.g., a movie)
Thank heavens for this.
WAI did not quite understand that it was an inch away from requiring
that every Web audio feed in the world be real-time-captioned. (No,
radio stations in meatspace aren't captioned. They don't need to be;
they don't have a visual form. Music on compact discs should not be
captioned for the same reason; music videos *should* be captioned
because they *do* have a visual component. Web-based audio feeds
shouldn't have to be captioned, either. Oh, and has anyone realized
yet that, just as Napster was an Internet application and not a Web
application, Web radio is usually not a Web application either? This
is the Web Accessibility Initiative; please limit your feature creep.)
> 4. if the Web content is real-time video with audio, real-time
> captions are provided unless the content:
> + is a music program that is primarily non-vocal
Again, the WAI essentially condemns any online real-time videocaster
to caption all its material. Is the WAI aware of just how difficult
that is when using present-day software? It is *not* as easy as
adding signals to Line 21. There isn't anything remotely resembling a
standardized and reliable infrastructure set up for this task yet,
all usages of Java applets, ccIRT, or other software notwithstanding.
If the video feed also appears on TV, how do you propose to reroute
the TV captions to online format? Or are you actually suggesting that
each minute of video be separately captioned twice-- once for TV,
once online?
My previous point remains in place: A standalone video player does
not necessarily have anything to do with the Web, really. It's an
Internet application, not a Web application; the WAI has no scope or
authority over it. Unless of course you'd like to retroactively
redefine the mandate.
> 5. if the Web content is real-time non-interactive video (e.g. a
> Webcam of ambient conditions), an accessible alternative is
> provided that achieves the purpose of the video.
Really?
How's that gonna work?
If my Webcam exists to show the weather outside, how am I going to
provide captions or descriptions that impart the same information?
Or what if my Webcam is pointed at my aquarium, or my pet canaries,
or the three adorable little kittens I got the other week? If the
purpose of the Webcam is to let people *see* what's going on with the
fish, birds, or cats, how do I automatically convert that to an
accessible form? (Especially if there's a live microphone and the
canaries sing or the kittens mewl?)
Real-world example: Webcams in day-care centres so snoopy moms (and
even dads) can watch what caregivers and children are doing. How does
one automatically convert those ever-changing images to an accessible
form?
<http://www.parentwatch.com/content/press/display.asp?p=p_0008>
What if the Webcam's purpose is to tell people if I'm in the office
or not? They look at the picture; if they see me, I'm in, and if they
don't, I'm not. Are you saying I have to manually set a flag in some
software that will send along some kind of text equivalent? I bought
a Webcam to avoid having to do that. Webcams provide real-time
information; interpretation is left to the viewer, not the author.
There *is* no author in this scenario; it's an automated feed.
I would say it would be fair to exempt nearly any Webcam that
attempts to display ambient conditions. Perhaps that is a somewhat
inadequate definition, but it beats what we've got now ("real-time
non-interactive video"). An equivalent similar to alt="Webcam of my
office" is clearly necessary if something like an <img> element is
used, but beyond that, isn't the Web Accessibility Initative merely
engaging in yet more gassy hypothesizing? People with next to no
lived experience of captioning or description, and not much more
experience with Webcams, are writing down guidelines that, in one
imaginable turn of events, tens of thousands of Web sites would have
to follow, perhaps under government sanction?
I believe I've used the term "half-baked" already. Perhaps
"half-arsed" is more in order.
> 6. if a pure audio or pure video presentation requires a user to
> respond interactively at specific times in the presentation, then
> a time-synchronized equivalent (audio, visual or text)
> presentation is provided.
Such presentations are not covered under the definition, nor,
arguably, should they be.
Let's work through this scenario posited above, shall we?
An audio presentation, which deaf people can't hear in the first
place, tells us something like "Make your selection now." The
checkpoint seems to require a balloon to pop up saying "Make your
selection now." Selection about what? What are you talking about? I
haven't heard anything!
A video presentation, which blind people can't see in the first
place, tells us something like "Pick a number from 1 to 10." The
checkpoint seems to require a voice to somehow be made manifest
saying "Pick a number from 1 to 10." Why pick a number? What are you
talking about? I haven't seen anything!
How is an all-audio device supposed to display something visually?
How is an all-video device expected to display something auditorily?
Has anyone spent even half a second thinking through these things?
Can proponents of this requirement-- indeed, of every single
requirement everywhere in WCAG 2.0-- provide five real-world examples
available right now to which the requirements would apply? And also
explain how to make it work, using today's tools?
>You will have successfully met Checkpoint 1.2 at Level 2 if:
>
> 1. the site has a statement asserting that the audio description has
> been reviewed and it is believed to include all significant visual
> information in scenes, actions and events (that can't be perceived
> from the sound track) is provided to the extent possible given the
> constraints posed by the existing audio track (and constraints on
> freezing the audio/visual program to insert additional auditory
> description).
Check the grammar there: "Is provided."
Now, what does this mean? Can somebody name five examples of audio
description that are not reviewed in some way?
Does this checkpoint, and the others like it elsewhere in WCAG 2.0,
not in fact countenance some kind of accessibility classification
board that would pre-screen descriptions, text equivalents, etc.
before the site would be permitted to go online?
Does it not assume that every page on the Web is lovingly crafted and
overseen by a conscientious, caring, sharing human being? Has no one
ever heard of dynamically-generated pages, like the tens of thousands
of mailing-list-archive pages at W3.org? Who's gonna manage all those?
> 2. captions and Audio descriptions are provided for all live
> broadcasts that are professionally produced.
What does professional production have to do with anything? How are
you defining that?
Why is "Audio descriptions" capitalized, and why has this error gone
unfixed through three or four WCAG 2.0 drafts? (I was wondering if
anyone would ever bother to read their own guidelines closely enough
to notice. Apparently not. Bit of a recurring problem there.)
> 3. if Web content is an interactive audio-only presentation, the user
> is provided with the ability to view only the captions, the
> captions with the audio, or both together.
I believe we have covered this in detail. Audio-only presentations
are not included in the definition. Independent control of picture
and captions (wait! now we want independent control of *sound*!) is
presently impossible in practice and is unneeded.
> You will have successfully met Checkpoint 1.2 at Level 3 if:
>
> 1. a text document (a "script") that includes all audio and visual
> information is provided.
I believe the term is "transcript," and it does not "require"
"quotation marks."
Now, how does one include "all... visual information"? I thought this
was a *text* document.
You do realize that the only possible way to satisfy a surface
reading of this requirement is to open-caption the video and print
out every frame of it?
> 2. captions and Audio descriptions are provided for all live
> broadcasts which provide the same information.
What does the word "which" have scope over in that sentence? Like so
many other clauses in WCAG documents, it appears only to have been
skimmed by the WAI and not actually read, let alone understood or
reality-checked.
>The following are additional ideas for enhancing a site along this
>particular dimension:
Please don't.
>Definitions (informative)
>[...]
> * captions are text equivalents of auditory information from speech,
> sound effects, and ambient sounds that are synchronized with the
> multimedia presentation.
Text equivalents?
You mean like alt texts, where the text equivalent must equate with
the *function* of the graphic?
So the captions must have the same *function* as the audio?
Don't you mean captions are *written words* that *transcribe* speech
and notate and render significant non-speech information?
> * audio descriptions are equivalents of visual information from
> actions, body language, graphics, and scene changes that are
> voiced (either by a human or a speech synthesizer) and
> synchronized with the multimedia presentation.
What is an "equivalent of visual information"?
Isn't a description a *description*, not an equivalent?
Why is "body language" specifically enumerated? In practice, body
language is difficult to describe and circumstances rarely make it
possible to give a description of it *first*. "Stands tensely at
attention" is less important than "stands at attention."
>Benefits (informative)
>
> * People who are deaf or have a hearing loss can access the auditory
> information through the captions.
> * People who are blind or have low vision as well as those with
> cognitive disabilities who have difficulty interpreting visually
> what is happening benefit from the audio descriptions of the
> visual information.
More horrid, strangulated grammar. WAI really needs to get a handle
on that; I've sat in meetings where people debated the surface
meaning of WCAG 1.0 provisions because they were so badly written.
There is *extremely* limited evidence that people with learning
disabilities genuinely benefit from descriptions.
> Note: Time-dependent presentations that require dual, simultaneous
> attention with a single sense can present significant barriers to some
> users.
But they are *inevitable* in accommodating people who have only *one*
sense to use.
A nondisabled person can watch and listen simultaneously. A deaf
person can only watch; a blind person can only listen. Whoa, big
surprise-- to render speech in visible text adds something else to
look at, and to render action in audible speech adds something else
to listen to.
Yes? And?
I know that caption- and description-naive nondisabled people have a
hard time keeping up at first, but really, do we want to codify such
inadequacies in an official guideline?
>Depending on the nature of the of presentation, it may be
> possible to avoid scenarios where, for example, a deaf user would be
> required to watch an action on the screen and read the captions at the
> same time.
Why? That is the nature of captioning.
Why is the Web Accessibility Initiative trying to define twenty years
of captioning practice out of existence?
--
Joe Clark | joeclark@joeclark.org
Accessibility <http://joeclark.org/access/>
Weblogs and articles <http://joeclark.org/weblogs/>
<http://joeclark.org/writing/> | <http://fawny.org/>
Received on Monday, 25 November 2002 17:44:27 UTC