Fwd: Progress on video accessibility

Hi everybody,

I sent this email to the WHATWG today, which has more details than
what John forwarded the other day. I thought I should share it with
this mailing list, too, to give everyone sufficient access to comment.

Best Regards,
Silvia.

---------- Forwarded message ----------
From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Fri, Jul 31, 2009 at 1:36 PM
Subject: Progress on video accessibility
To: WHAT Working Group <whatwg@lists.whatwg.org>


Hi,

Several proposals have been made on this list in the past on how to
approach accessibility for the HTML5 <video> element.

I think the best way in which we can progress this is by doing an
implementation of a spec, discussing it, improving the spec, rinse and
repeat, which IIUC is the process WHATWG is using anyway.

So, in this spirit, I would like to contribute a specification and
implementation for how to attach out-of-band time-aligned text data to
HTML5 <video> (and <audio>) elements.

What I mean by out-of-band is that the text that is associated with
the <video> is not available inside the binary video stream, but as a
separate Web resource and needs to be retrieved before it can be
displayed. This is a common use case and should be supported from
within HTML5 in addition to supporting in-band time-aligned text.

BTW: in-band time-aligned text for Ogg is something I want to
experiment with next, since I would like us to get to an API that
supports both, in-band and out-of-band, in the same way.


But let me get straight to this current experiment:
 -  the demo is at http://www.annodex.net/~silvia/itext/ .
 -  the specification is at
https://wiki.mozilla.org/Accessibility/HTML5_captions
 -  a description and a first set of feedback that I have gathered is
at https://wiki.mozilla.org/Accessibility/Experiment1_feedback


Let me list some of the thoughts behind the proposal:

* I can see a need for a multitude of different categories of
time-aligned text that either already exist or will be developed in
the future. The list that I can currently grasp is mentioned in the
specification. While these text categories are rather diverse (e.g.
karaoke text, ticker text, chapter markers, captions), they all share
common properties and can be handled in fundamentally the same way by
a browser. I therefore propose a common "itext" element (for "included
text") to deal with associating such time-aligned text resources with
<video> resources.

* While the demo only shows how to apply <itext> to <video>, I believe
it should be possible to also associate all of them with <audio>. An
implementation experiment is necessary to examine the differences,
which I believe to be mostly about display mechanisms.

* I can also see a need for internationalisation of each text
category. I.e. each text resource will come with an associated
language for which it is valid and alternative language resources will
be made available. This is why I am suggesting the @lang attribute.

* Together, the @category and @lang attributes create a list of text
tracks for the <video> for different display mechanisms. Assuming
differing @lang tracks of the same @category are alternatives, while
all @category tracks are allowed to appear at the same time, I
developed a DVD-like menu for time-aligned text. You will find it in
the demo under the "text bubble" button.

* It is unclear, which of the given alternative text tracks in
different languages should be displayed by default when loading an
<itext> resource. A @default attribute has been added to the <itext>
elements to allow for the Web content author to tell the browser which
<itext> tracks he/she expects to be displayed by default. If the Web
author does not specify such tracks, the display depends on the user
agent (UA - generally the Web browser): for accessibility reasons,
there should be a field that allows users to always turn display of
certain <itext> categories on. Further, the UA is set to a default
language and it is this default language that should be used to select
which <itext> track should be displayed.

* Since there is not a single file format that satisfies all
categories of time-aligned text, I can see a need for <itext> to allow
it to link to several different text formats. The only one used in the
demo is SRT. I will also be looking at LRC and DFXP. I believe
ultimately we will want to state which format a browser must support
as baseline, but I also believe we need to experiment with them a bit
more. I am not intending to define another new format at this stage.
However, I have added a @type attribute to <itext> so we can specify
which file format is to be expected at the end of the @src link. This
is similar to the @type attribute of the <video> element.

* Several of the current de-fact standard formats of time-aligned text
are rather simple (including SRT and LRC) and do not include
information about the charset that they are encoded in. For that
reason, a @charset attribute was added to the <itext> specification.

* Another typical feature of time-aligned text files is that they may
be out of sync with the actual video file. For that purpose, a @delay
attribute was suggested as an addition to the <itext> element. This
has not been implemented into the demo. In the feedback to this
proposal, a further "stretch" or "drift" attribute was suggested.

* The idea for the display of the text categories is that we use
existing browser display capabilities to do the display. Thus, I have
defined for each text category a default display mechanism, i.e. a div
into which it gets rendered into the DOM and a default CSS styling for
the div and the text inside it. This also enables a Web developer to
make changes to the default display simply through their own CSS
styling.

* The demo includes a textual audio description track, which allows
visually impaired people to experience the video through use of their
screenreader. The text is rendered into a div that has the @aria-live
attribute set and thus generally works. I have used it successfully on
my Mac with Firefox and the firevox plugin. I have heard from others
who have used JAWS and NVDA successfully with it, though with some
bugs, which are being looked into.

* The demo generally works in all browsers that support the <video>
tag, including Safari when XiphQT is installed.


I am curious about comments to this proposal and suggestions for improvement.

I have not yet developed an improved specification, but instead have
collected feedback at
https://wiki.mozilla.org/Accessibility/Experiment1_feedback#Thoughts_.2F_Feedback
.
Feel free to comment on the feedback, too - either here on the mailing
list or in the wiki.

Feedback has generally been encouraging, so I believe we are on the right track.


Regards,
Silvia.

P.S. I may not have reached everyone who should know about this
proposal, so feel free to forward the email to those people and invite
them to contribute. Thanks.

Received on Friday, 31 July 2009 07:17:26 UTC