Re: video and long text descriptions / transcripts from Silvia Pfeiffer on 2012-04-05 (public-html-a11y@w3.org from April 2012)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Thu, 5 Apr 2012 12:23:54 +1000
To: David Singer <singer@apple.com>
Cc: HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-ID: <CAHp8n2m81U-ycjk_ba4AD+yyud5O60HPekHcuGTZ5PbYCZHL0A@mail.gmail.com>
On Thu, Apr 5, 2012 at 5:16 AM, David Singer <singer@apple.com> wrote:
>
> On Mar 30, 2012, at 14:52 , Silvia Pfeiffer wrote:
>
>> Hi all,
>>
>> I would like to see a (civilized) discussion of a core question that
>> relates to the issues 194 and 203 [2] for video.
>>
>> [1] http://www.w3.org/html/wg/tracker/issues/194
>> [2] http://www.w3.org/html/wg/tracker/issues/203
>>
>> We keep talking about "long text descriptions for videos" and
>> "transcripts" as separate things. There is an implied assumption that
>> we need two different solutions for these, which I would like to
>> challenge.
>
> Ah.  I think that
> a) they are semantically different;  if I am looking for the transcript specifically, a long textual description may not meet the need, and so they should be semantically distinct


They are semantically different at a detail level.

If you assume a sighted user and a list of text files that provide
alternative descriptions to the video, then you can certainly list a
large number of different "long text descriptions" that you can
provide:
* a full transcription of everything happening in the video, including
a transcript of all dialogs and the important visual bits
* the book/script that underlies the production
* a pure transcript of what is being said
* a list of major plot points in the video
* a description of each one of the scenes of the video
* a description of each one of the shots of the video
* a summary of what is happening with enough meat to be different from
a short description
* etc etc etc

But really: who is going to produce all these things? And which one is
the best for a deaf-blind user to have? Certainly the answer is that a
full transcription of everything being said and all the scene
descriptions is the best that a deaf-blind user can have and also the
most complete text representation of the video. I therefore call this
"the optimal long description document". This is the best possible
text representation that I would like to give to AT. Lacking that, I
would accept any of the other listed text documents also as a
"replacement long description document" for AT.

The point I am trying to make is not at the detail level. It is at the
macro level. How useful is it for a deaf-blind user to be presented
with a number of documents that they could read that provide some form
of "long description" for them? How useful is it to be several docs
rather than a single one? Preferably it should just be one document
and the one chosen should be the most inclusive one, the one that has
the best description of them all, and that one is what we have this
far discussed as "the transcript".

Do you disagree?


> b) authors are unlikely to provide both, however

Yes, that is one of the things on my mind, too. This is why I don't
think it makes much sense to have both a @transcript and a @longdesc
attribute on the video: if we have an actual transcript, it would be
the same document behind both attributes and if we don't have on, we'd
have a url behind the longdesc and none behind the transcript. In both
these situations, the @transcript attribute is not useful.


> c) the transcript/description should be part of the 'normal DOM'

The text itself? That's how it could be provided, but why is that a
requirement? Why is it not acceptable that the text is in another Web
resource?


> d) the relationship should be discoverable by anyone, not just accessibility tools

I agree. It definitely has to be exposed by the video element to AT.
In addition, there needs to be a visual presentation. There are
several ways to get this: one is a visual indication on the video
element which is provided through the shadow DOM, another is through a
visual indication somewhere else in the browser (e.g. the URL is
exposed on mouseover and a CTRL+ENTER click can activate it), and the
last is that it's a separate DOM element on the screen that is
programmatically linked.

> e) they should use a common mechanism to link the media to its transcript/description etc.

I disagree with this requirement. A long description for the purposes
of deaf-blind users has to be discoverable when focused upon the video
element. Other related content such as interactive transcripts,
scripts, and other video metadata only has to live nearby the video
and be discoverable when moving around the page. I don't see a need
for a programmatic association of those with the video other than what
@describedBy already offers.


Cheers,
Silvia.
Received on Thursday, 5 April 2012 02:24:43 UTC