Re: video and long text descriptions / transcripts

From: David Singer <singer@apple.com>
Date: Thu, 05 Apr 2012 09:15:51 -0700
Cc: HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-id: <938B1383-0A33-4745-9EAE-C45A193C208D@apple.com>
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>

On Apr 4, 2012, at 19:23 , Silvia Pfeiffer wrote:

> On Thu, Apr 5, 2012 at 5:16 AM, David Singer <singer@apple.com> wrote:
>> On Mar 30, 2012, at 14:52 , Silvia Pfeiffer wrote:
>>> Hi all,
>>> I would like to see a (civilized) discussion of a core question that
>>> relates to the issues 194 and 203 [2] for video.
>>> [1] http://www.w3.org/html/wg/tracker/issues/194
>>> [2] http://www.w3.org/html/wg/tracker/issues/203
>>> We keep talking about "long text descriptions for videos" and
>>> "transcripts" as separate things. There is an implied assumption that
>>> we need two different solutions for these, which I would like to
>>> challenge.
>> Ah.  I think that
>> a) they are semantically different;  if I am looking for the transcript specifically, a long textual description may not meet the need, and so they should be semantically distinct
> They are semantically different at a detail level.
> If you assume a sighted user and a list of text files that provide
> alternative descriptions to the video, then you can certainly list a
> large number of different "long text descriptions" that you can
> provide:
> * a full transcription of everything happening in the video, including
> a transcript of all dialogs and the important visual bits
> * the book/script that underlies the production
> * a pure transcript of what is being said
> * a list of major plot points in the video
> * a description of each one of the scenes of the video
> * a description of each one of the shots of the video
> * a summary of what is happening with enough meat to be different from
> a short description
> * etc etc etc
> But really: who is going to produce all these things?

I don't know; I don't think anyone will produce them all at once.  I do think we can differentiate "some sort of transcript" (in which time advances with the program material) and a description, which, while narrative, doesn't claim to match the sequence of the video.

> The point I am trying to make is not at the detail level. It is at the
> macro level. How useful is it for a deaf-blind user to be presented
> with a number of documents that they could read that provide some form
> of "long description" for them? How useful is it to be several docs
> rather than a single one? Preferably it should just be one document
> and the one chosen should be the most inclusive one, the one that has
> the best description of them all, and that one is what we have this
> far discussed as "the transcript".

It's probably not.  But I think we can say "this is a transcript" and "that is a long description" to all users, and if (in the rare case) both are offered, they can choose, and if only one, they can decide if it meets their needs.

>> b) authors are unlikely to provide both, however
> Yes, that is one of the things on my mind, too. This is why I don't
> think it makes much sense to have both a @transcript and a @longdesc
> attribute on the video: if we have an actual transcript, it would be
> the same document behind both attributes and if we don't have on, we'd
> have a url behind the longdesc and none behind the transcript. In both
> these situations, the @transcript attribute is not useful.

I think it may well be worth differentiating these three cases:
a) the video has some sort of transcript (e.g. the script) but no description;
b) the video has some sort of description (e.g. a plot summary) but no transcript;
c) the video has both.

>> c) the transcript/description should be part of the 'normal DOM'
> The text itself? That's how it could be provided, but why is that a
> requirement? Why is it not acceptable that the text is in another Web
> resource?

I just mean it's something we mark up for everyone. These are not squirreled away for a small class of users (e.g. those needing accessibility).
>> d) the relationship should be discoverable by anyone, not just accessibility tools
> I agree. It definitely has to be exposed by the video element to AT.
> In addition, there needs to be a visual presentation. There are
> several ways to get this: one is a visual indication on the video
> element which is provided through the shadow DOM, another is through a
> visual indication somewhere else in the browser (e.g. the URL is
> exposed on mouseover and a CTRL+ENTER click can activate it), and the
> last is that it's a separate DOM element on the screen that is
> programmatically linked.

I like that last one, myself.  We can encourage 'standard controllers' to expose the linkage (e.g. a popup in the controller that offers "Transcript | Long Description | "

>> e) they should use a common mechanism to link the media to its transcript/description etc.
> I disagree with this requirement. A long description for the purposes
> of deaf-blind users has to be discoverable when focused upon the video
> element. Other related content such as interactive transcripts,
> scripts, and other video metadata only has to live nearby the video
> and be discoverable when moving around the page. I don't see a need
> for a programmatic association of those with the video other than what
> @describedBy already offers.

If describedBy does it, then it's discoverable.

David Singer
Multimedia and Software Standards, Apple Inc.
