W3C home > Mailing lists > Public > public-html-a11y@w3.org > April 2012

Re: video and long text descriptions / transcripts

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Wed, 11 Apr 2012 22:11:29 +1000
Message-ID: <CAHp8n2mpAqr=tZ7nKcc47WFCkPafuBm-+yY9Bs4nYKEJ4i5SnQ@mail.gmail.com>
To: John Foliot <john@foliot.ca>
Cc: David Singer <singer@apple.com>, HTML Accessibility Task Force <public-html-a11y@w3.org>
On Wed, Apr 11, 2012 at 2:37 AM, John Foliot <john@foliot.ca> wrote:
> Silvia Pfeiffer wrote:
>> Interesting. I would have thought that the idea of the long
>> description link is indeed to provide a link to such an "alt format".
>> How else is blind person to understand what's in an image? Or a
>> deaf-blind user what's in a video? Is this really not what we want
>> from the long description???
> The goal of informing the user is correct. The mechanism is incorrect. A
> Long Description has been long ago scoped and defined:
>        "An object's AccessibleDescription property provides a textual
> description about an object's visual appearance. The description is
> primarily used to provide greater context for low-vision or blind users, but
> can also be used for context searching or other applications."
> [source:
> http://msdn.microsoft.com/en-us/library/system.windows.forms.control.accessi
> bledescription.aspx]
> The transcript does more than provide a textual description about the visual
> appearance, it is a textual alternative to the actual video itself, in its
> entirety. It is not "contextual" information, it is a verbatim
> transcription, including dialog and significant audio clues (i.e.
> [clapping]). While not a perfect analogy, the video Long Description should
> be more of a synopsis or summary. This confusion (BTW) is also at the root
> of the @poster issue (where the non-sighted user wants a contextual
> description of the *image*, not the video).

I do not see a need for a long description of this type on video.
Every video that is published online and that I have every seen has
had some text associated with it which is a summary of the video meant
to tell the user what it is that they are expecting. This can be
sufficiently provided through @aria-describedby.

In contrast, what we do *not* have is a textual alternative of the
video for deaf-blind users (and anyone else who prefers to read the
text over viewing the video).

> In fact I have been consistent about
> all of the following:
>  * Short name (AccName) for the video  - perhaps @alt, or aria-labelledby
> or aria-label.

Agreed: we can get that from @aria-label .

>  * Long Description (a.k.a. contextual description) for the VIDEO - perhaps
> @longdesc, or aria-describedby or aria-describedat.

We can get that from @aria-describedby.

>  * Long Description (a.k.a. contextual description) for the IMAGE (when
> specified by @poster) - I have proposed <firstframe> a.k.a. <poster> which
> was rejected; currently in discussion with the ARIA WG on an aria-poster or
> aria-background attribute.

We don't need anything new for this: it is part of @aria-label (for a
short description) and @aria-describedby (for a longer one).

>  * A linking mechanism for the verbatim transcript - I have proposed
> @transcript and/or <track kind="transcript">.

OK, I am proposing that this should be @aria-describedAt (or whatever
else we come up with for a replacement for the long description).
Further, I would suggest that we focus on providing such "alt formats"
through @aria-describedAt, since summaries can readily be provided
through @aria-describedBy.

> Yes, it's a long menu, but that's the nature of the MULTI in multi-media,

Just to be accurate: Video by itself is not multimedia. Audio by
itself is not multimedia. The Web page as a whole with text, images,
video and audio *is* multi-media.

> and it accurately reflects what I understand the needs of disabled users
> are, backed up in discussions with those end users. (Also please note that
> in every instance I have proposed more than 1 possible solution, so I am not
> entrenched on the method, only the outcome).

Good. It seems we already have most of what we need.

>> The <track> element is an exception in this respect because we're
>> allowing internationalization to kick in as a selector for a set
>> number of time-aligned document types. This is a very different use
>> case.
> So I get that this is a different use-case (but is it?)  We "accommodate"
> based on language (i18n), so why shouldn't/can't we also accommodate based
> on physical need (a11y)? Since <track> is already an exceptional element in
> this regard, why not extend on that exceptionality? [sic]

We are already accommodating on physical need: captions, descriptions
and chapters are supporting a11y needs. However, when somebody doesn't
want to have their access in sync with the timeline of the video,
because that timeline means nothing to them, we nee a different
mechanism to accommodate this.

> If we extend the selectors (@kind) to extend the functionality, it seems to
> me trivial for the implementers to embrace this: it is one more choice in
> the contextual menu they are already providing to activate closed-captions,
> or described video (audio or textual), or sub-titles (in any of the numerous
> languages that *may* be provided)... the point is, the list (menu) is
> already potentially quite long, and is already mixing apples with oranges
> (described text versus Spanish sub-titles).

They are not apples and oranges. They share the timeline. The
transcript shares nothing of that sort.

>> For time-aligned documents you can use <track> with @kind=metadata.
>> For non-time-aligned documents <track> makes no sense - just like
>> publishing a table in a <canvas>,
> Hmmm.... table data (as an alt format) can certainly be used to draw
> <canvas> graphs - see:
> http://www.filamentgroup.com/lab/update_to_jquery_visualize_accessible_chart
> s_with_html5_from_designing_with
> (what's nice here is that screen readers can still access the table data)

My point was the opposite: rendering the table as canvas pixel data.
It is a possible solution, but why would you do that? It doesn't make
use of the strengths of <canvas> and has negative implications on

>> or an image in a <video>,
> Like @poster?...

No, that is not an image - it has a play button and is therefore a
video. It has none of the features of an image and is just been hidden
away. :-)
But I know we can't agree on this (sigh).

>> or a list of items in a <textbox>
> ...because we have different, specialized <input> type(s) with <select> or
> <checkbox> to meet that requirement. I would be OK with a specialized
> element for the transcript too: I've already proposed @transcript, but could
> live with a child element: <video><transcript src="path to
> transcript"></video>

That's exactly the point I am trying to make with avoiding the use of
<track> for transcripts: transcripts have specialised needs that are
not met by <track> and the features that <track> provides are not
useful to transcripts.Therefore, <track> is a poor solution for

Both of your suggestions: the attribute and the separate element are
better approaches, though not the only ones. Edward (with some input
from me, if I may say so) has made a nice analysis of the different
possible approaches for associating transcripts with a video element,
see http://www.w3.org/html/wg/wiki/ISSUE-194/Research . It lists many
of the advantages and disadvantages of the different approaches.

Received on Wednesday, 11 April 2012 12:12:24 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:05:28 UTC