W3C home > Mailing lists > Public > public-html-a11y@w3.org > April 2012

RE: video and long text descriptions / transcripts

From: John Foliot <john@foliot.ca>
Date: Tue, 10 Apr 2012 09:37:10 -0700
To: "'Silvia Pfeiffer'" <silviapfeiffer1@gmail.com>
Cc: "'David Singer'" <singer@apple.com>, "'HTML Accessibility Task Force'" <public-html-a11y@w3.org>
Message-ID: <002f01cd1738$2f0e0db0$8d2a2910$@ca>
Silvia Pfeiffer wrote:
> Interesting. I would have thought that the idea of the long
> description link is indeed to provide a link to such an "alt format".
> How else is blind person to understand what's in an image? Or a
> deaf-blind user what's in a video? Is this really not what we want
> from the long description???

The goal of informing the user is correct. The mechanism is incorrect. A
Long Description has been long ago scoped and defined:

	"An object's AccessibleDescription property provides a textual
description about an object's visual appearance. The description is
primarily used to provide greater context for low-vision or blind users, but
can also be used for context searching or other applications."

The transcript does more than provide a textual description about the visual
appearance, it is a textual alternative to the actual video itself, in its
entirety. It is not "contextual" information, it is a verbatim
transcription, including dialog and significant audio clues (i.e.
[clapping]). While not a perfect analogy, the video Long Description should
be more of a synopsis or summary. This confusion (BTW) is also at the root
of the @poster issue (where the non-sighted user wants a contextual
description of the *image*, not the video).

> I would have thought we would want to integrate such "alt formats" in
> the Web and that this is the whole idea of aria-describedbAt. What use
> is it to a Web user to not have the "alt format" at their fingertip
> when browsing the Web?

Not the "whole idea", but a significant part of it, yes.  At issue (and a
point that David and I seem to be in agreement on) is that there is a
difference between a contextual description of a video and a verbatim
transcription of said video, and the end user wants both. Whether we will
always get both from authors is not even at issue (we likely won't), but we
must ensure that a mechanism exists if and when we do. We currently do not
have such a mechanism.

> OK. This is shattering everything I've come to expect from
> accessibility of Web applications: why would we not want to offer a
> accessibility user an "alt format" directly on the Web? Why do we have
> to force them to find functional replacements outside the Web?

We - *I* - don't want that, but I also don't want to pretend that potatoes
are meat because protein is better than starch. I want a linking mechanism
for both the contextual description *and* the verbatim transcript, and have
been fairly consistent in that request. In fact I have been consistent about
all of the following:

  * Short name (AccName) for the video  - perhaps @alt, or aria-labelledby
or aria-label.
  * Long Description (a.k.a. contextual description) for the VIDEO - perhaps
@longdesc, or aria-describedby or aria-describedat.
  * Long Description (a.k.a. contextual description) for the IMAGE (when
specified by @poster) - I have proposed <firstframe> a.k.a. <poster> which
was rejected; currently in discussion with the ARIA WG on an aria-poster or
aria-background attribute.
  * A linking mechanism for the verbatim transcript - I have proposed
@transcript and/or <track kind="transcript">.

Yes, it's a long menu, but that's the nature of the MULTI in multi-media,
and it accurately reflects what I understand the needs of disabled users
are, backed up in discussions with those end users. (Also please note that
in every instance I have proposed more than 1 possible solution, so I am not
entrenched on the method, only the outcome).

> The <track> element is an exception in this respect because we're
> allowing internationalization to kick in as a selector for a set
> number of time-aligned document types. This is a very different use
> case.

So I get that this is a different use-case (but is it?)  We "accommodate"
based on language (i18n), so why shouldn't/can't we also accommodate based
on physical need (a11y)? Since <track> is already an exceptional element in
this regard, why not extend on that exceptionality? [sic] 

If we extend the selectors (@kind) to extend the functionality, it seems to
me trivial for the implementers to embrace this: it is one more choice in
the contextual menu they are already providing to activate closed-captions,
or described video (audio or textual), or sub-titles (in any of the numerous
languages that *may* be provided)... the point is, the list (menu) is
already potentially quite long, and is already mixing apples with oranges
(described text versus Spanish sub-titles).

> My question was focused on where we are asking a blind user to select
> between different alternative text representations that would equally
> serve as a long description. Can you give me an example that is not
> the (very special-cased) <track> element?

No, but again I can't give you an example where we provide a contextual menu
for alternate language renderings (a.k.a. sub-titles) either. The closest
we've ever had to that was/is the @lang attribute, which is significantly

> For time-aligned documents you can use <track> with @kind=metadata.
> For non-time-aligned documents <track> makes no sense - just like
> publishing a table in a <canvas>,

Hmmm.... table data (as an alt format) can certainly be used to draw
<canvas> graphs - see:
(what's nice here is that screen readers can still access the table data)

> or an image in a <video>, 

Like @poster?...

> or a list of items in a <textbox> 

...because we have different, specialized <input> type(s) with <select> or
<checkbox> to meet that requirement. I would be OK with a specialized
element for the transcript too: I've already proposed @transcript, but could
live with a child element: <video><transcript src="path to

> An unlimited list of different @kind values is not reasonable, in
> particular if every different @kind requires the browsers to implement
> special rendering techniques.

OK, so a limited taxonomy would be fine - heck the addition of
@kind="transcript" would solve the 80% use-case today. (Although you *do*
know that I am not the first person to propose extensibility of the @kind

> "User choice" would be satisfied by providing a list of links under
> the video.

Herein lies the crux of the problem: programmatic association and visual
encumbrance, two known issues we have today with @longdesc. 

How do we allow users to access a transcript or longer textual description
when the video is a near-full screen or even a full-screen rendering? We
cannot force authors to provide these types of on-screen links any more than
we can insist that they add a "D" link or an on-screen link to the longer
textual description of an image - this is all ground beaten to death in the
@longdesc wars. 

We require an elegant, consistent, programmatic means of linking all of the
associated bits of stuff that is our video into one neatly authored package,
and suggesting that the author simply provide an on-screen link will miss
many instances, and is too simplistic a suggestion I'm afraid. The upshot is
that we require that the user-agent, the browser, provide a means for the
end user to access supplemental information about any given element when
provided, whether it is an @longdesc/aria-describedat textual description or
a video transcript.

Received on Tuesday, 10 April 2012 16:37:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 04:42:58 GMT