[media] progress on video transcript discussion

Hi all,

I wanted to share what I took out of today's transcript phone
conference. This includes both conclusions that we have come to in the
course of the discussion as well as a proposal of how to progress.
Some of this is my personal interpretation, so chip in if you disagree
anywhere, or have any other suggestions.

Points that we discussed:


1. Interactive Transcript

This was the simplest part of our discussion, so I want to get this
out of the way up front.

We discussed the need for interactive transcripts in HTML5. Everyone
agreed that it is a very important feature and there are many use
cases. Examples of current solutions include:
http://wistia.com/product/transcripts
http://www.3playmedia.com/interactive/
http://www.protranscript.com/
http://speakertext.com/captionbox
http://clickablecaptions.com/captioning-with-instant-search/
http://dotsub.com/labs/interactiveTranscript
http://www.subply.com/en/Products/InterActiveTranscript.htm

The main problem, however, is the timeline that we are given to
finalize HTML5. We do not want to start with a new feature that
requires many cycles to become perfect (e.g. the video element has
taken 5 years to be sorted with all its attributes and elements). So,
we want to move this feature to HTML.next and encourage Web developers
to use JavaScript solutions for creating more interactive transcript
examples so we can learn more about the needs for features and
attributes. We'd in particular like to encourage Web developers to
make use of TextTrack and TextTrackCue objects as their source for
interactive transcripts since it is likely a good way to provide the
information required to render an interactive transcript in the
browser. We'd also like to start the design of a solution in HTML.next
and continue the already fruitful discussions.

For HTML5, however, we're shelving this issue.


2. Programmatic linkage of video and transcript

The bigger issue - and one which we haven't resolved yet, but have
made good progress on, was the question about the need of programmatic
linkage between the video and its transcript.

Our discussion started around options of having the transcript tightly
coupled with the video element, i.e. how to make it part of the video
element.

We achieved general agreement that an actual rendering of the
transcript inside the boundaries of the video element (e.g. as an
overlay or an alternative to the video) doesn't make much sense. The
transcript itself needs to be rendered outside the video element, e.g.
in a different region on the page or on a different page.

The second use case for tightly coupling the transcript to the video
element was about the possibility of "activating" the transcript from
the video element's controls. The idea here is that we would have a
button on the video controls that when clicked might take us to
another Web page with the transcript on it, or when clicked might
unveil a section on the Web page near the video that shows the
transcript, or when clicked might scroll us down to a page offset
where the transcript is displayed. This is a nice theory, but is this
actually being used this way?

I personally have been unable to see transcripts used in this way in
any kind of video player on the Web. Where transcripts are displayed,
I have only seen links or text below the video. The only player that
was different is the video player of the Rachel Maddow show, see
http://www.msnbc.msn.com/id/26315908/vp/47654476#47654476 . This
player uses the transcript button to reveal/hide an interactive
transcript. Interactive transcripts are already tightly linked with
the video through their synchronized timeline, so having a button to
turn an interactive transcript on/off makes some sense. However, we
have already said that we are moving interactive transcripts to
HTML.next, so this is not something we need to solve now.

So, I actually believe that neither do we need a solution that renders
the transcript inside the video element, nor do we need a solution
that creates a transcript button of some sort on the video element.

So, what do we actually need a programmatic linkage between the video
and the transcript for?


In my mind, we have been mingling up two actual use cases: the first
one where the transcript or a transcript activation button is visible
on the page further down, and the second one where no transcript is
visible on the page with the video and the transcript is actually on a
different page (possibly because the video was embedded).

The situation where the transcript is on the page but completely
invisible (including there is no button to make it un-hide) is not one
that I want to entertain here, since it can be satisfied by having a
separate Web page off-page and thus falls into the second use case.
(Also, I think it is pretty bad practice and should be discouraged.)
Note that having a button on the page that can hide/unhide the
transcript text makes this fall into the first use case and is not a
different use case.


2.1. First use case: Linkage of video and transcript where the
transcript is visible on the page

Sighted users need no programmatic linkage in this case, since they
can clearly see the transcript (or transcript link) further down on
the page.

However, blind users may find it difficult to discover that a
transcript is available when they directly go to the video element.
So, we need to make sure the availability of the transcript is
announced when reaching the video element, and possibly provide a
means to directly jump to the transcript (to avoid dealing with other
DOM clutter).

(Note: at one stage I thought @aria-describedby will solve this
problem, but when it refers to an <a> element, it just renders
flattened text, and when it reads a full text transcript, it can't be
navigated since it's all just a single AccDescription of the element.
So, I'm dropping this suggestion. So here are some new thoughts.)

The announcement can likely be made using @aria-label.

The linkage could be made by introducing a new @role attribute value
of "transcript" so it is possible to gain it as a navigation landmark.
This role would then go onto the element that contains the transcript,
which could be an <a>, <div>, <p>, <iframe> or anything else really
(including <button> or <textarea>).

There are many knowledgeable people on this list about aria, so maybe
you can chip in here.


2.2 Linkage of video and transcript where the transcript is not
visible on the page

At first sight this is a non-usecase: if we can't see the transcript,
why would we need to link to it from the video, in particular since we
don't want to put buttons into the video?

It took me a while to understand this myself: when you publish a video
on your page, you have control and you can put the transcript (or
transcript link) below the video and announce it in the screen reader
etc. But once this video is embedded somewhere else, you need a link
to discover the transcript's availability and jump back to the
original page. (This is actually the same for full-screen video, but
it's easy to exit fullscreen to get back to the transcript.)

One solution that needs no new markup is: the Web developer creates a
snippet of embed text (in an iframe) that includes a visible link back
to the original page and gives it a @role=transcript so it's
discoverable by blind users. That reduces this use case back to the
previous use case (transcript is visible). However, the problem with
this is that the page that embeds the video may not want to show such
a link and may only want the video, thus removing the extra link.

This is a problem both for sighted and non-sighted users. They may
find themselves on a page that has a video and there is a transcript
available for this video, but they have no means of discovering the
availability of the transcript or navigating to it. Sighted users may
never find out that a transcript actually exists. For non-sighted
users it's even worse - they could be misled into believing there is a
"transcript available below" when the @aria-label attribute has been
copied, but then search for it in vain.

So, my suggestion here would be to introduce a @transcript attribute
on the video that requires a full URL (we should discourage the use of
relative URLs since they don't work when embedding which is the main
use case here). Browsers would display this URL in the context menu of
the video element, so sighted users can discover them. Web developers
can be clever, create their own video controls and include the link
there if they like. And blind users will be told that there is a
"transcript available" and have some means of activating the linkage.

This would be the markup:

<video src="video.mpg" aria-label="video with transcript below"
transcript="@this_page#transcript"></video>
<div id="transcript" role="transcript">
<h4>This is a transcript</h4>
<p>blah blah blah</p>
</div>

And the embed would have:

<video src="video.mpg" aria-label="video with transcript below"
transcript="@this_page#transcript"></video>

Here is another example markup with an off-page link:

<video src="video.mpg" aria-label="video with transcript below"
transcript="@this_page#transcript"></video>
<a id="transcript" role="transcript" href="transcript.html">Transcript</a>

Now, it is possible that publishers may want to shorten this to

<video src="video.mpg" aria-label="video with transcript below"
transcript="@base_url/transcript.html"></video>
<a id="transcript" role="transcript" href="transcript.html">Transcript</a>

and this would essentially mean duplicating the link. I don't see that
as a problem, however, since they satisfy two different use cases.


I'm fully aware that this is somewhat related to the @longdesc
discussion. This use case could, therefore, also be solved by using
@aria-describedAt containing a link to the transcript (as in fact I
have suggested before). From what I've learnt in previous discussions,
we want the extra semantics of it being a transcript and not just a
random long description, so introducing a new attribute for this seems
right. If we can make it always have a full URL! ;-)


Hope this all helps to make progress.

Cheers,
Silvia.

Received on Wednesday, 6 June 2012 02:27:55 UTC