HTML5 video accessibility

From: Laura Carlson <laura.lee.carlson@gmail.com> · Date: Tue, 9 Dec 2008 06:58:32 -0600

For WAI-XTECH and PF's information the following is a  message on
HTML5 video accessibility from Silvia Pfeiffer. It may be of interest
in formulating a response to
http://lists.w3.org/Archives/Public/public-html/2008Sep/0421.html

Silvia wrote:

> For the last 2 months, I have been investigating means of satisfying
> video accessibility needs through Ogg in Mozilla/Firefox for HTML5.
>
> You will find a lot of information about our work at
> https://wiki.mozilla.org/Accessibility/Video_Accessibility and in the
> archives of the Ogg accessibility mailing list at
> http://lists.xiph.org/mailman/listinfo/accessibility .
>
> I wanted to give some feedback here on our findings, since some of
> them will have an impact on the HTML5 specification.
>
>
> What are we talking about
> -----------------------------------
> When I say "video accessibility", I'm actually only talking about
> time-aligned text formats and not e.g. captions as bitmaps or audio
> annotations as wave files.
> Since we analysed how to attach time-aligned text formats with video
> in a Web Browser, we also did not want to restrict ourselves to only
> closed captions and subtitles.
> It made sense to extend this to any type of time-aligned text on can
> think about, including textual audio annotations (to be consumed by
> the blind through a screenreader or braille output), karaoke, speech
> bubbles, hyperlinked text annotations, and others. There is a list at
> http://wiki.xiph.org/index.php/OggText#Categories_of_Text_Codecs which
> gives you a more complete picture.
>
>
> How is it currently done
> -------------------------------
> When looking at the existing situation around time-aligned text for
> video, I found a very diverse set ot formats and means of doing it.
>
> First of all, most media players allow you to load a video file and a
> caption/subtitle file for it in two separate steps. The reason is that
> most subtitles are produced by other people than the original content
> and this allows the player to synchronise them together. This is
> particularly the case with the vast majority of SRT and SUB subtitle
> files, but is also the case for SMIL- and DFXP-based subtitle files.
>
>> From a media file format POV, some formats have a means of
> multiplexing time-aligned text into the format, e.g. QuickTime has
> QTText and Flash has cuepoints. Others prefer to use external
> references, e.g. WindowsMedia and SAMI or SMIL files, RealMedia and
> SMIL files.
>
> For mobile applications, a subset of DFXP has been defined in 3GPP
> TimedText, which is actually being encapsulated into QuickTime QTText
> using some extensions, and can be encapsulated into MP4 using the
> MPEG-4 TTXT specification.
>
> As can be seen, the current situation is such that time-aligned text
> is being handled both in-stream and out-of-band and there are indeed
> requirements for both situations.
>
>
> Requirements
> -------------------
> Not to go into much detail here, but I have seen extensive arguments
> made on both sides of the equation for and against in-stream text
> tracks.
> One particular argument for in-stream text is that of downloading the
> video from some place and keeping all its information together in one
> file such that when it is distributed again, it retains that
> information.
> One particular argument for out-of-band text is the ability to add
> text tracks at a later stage, from another site, and even from a web
> service (e.g. a translation web service that uses an existing caption
> file and translates it into another language).
> In view of these requirements, I strongly believe we need to enable
> people to do both: provide time-aligned text through
> external/out-of-band resources and through in-stream, where the
> container format allows this.
>
>
> Proposal for out-of-band approach
> ----------------------------------------------
> I'd like to stimulate a discussion here about how we can support
> out-of-band time-aligned text for video in HTML5.
> I have seen previous proposals, such as the "track" element at
> http://esw.w3.org/topic/HTML/MultimediaAccessibilty#head-a83ba3666e7a
> 437bf966c6bb210cec392dc6ca53 and would like to propose the following
> specification.
>
> Take this as an example:
>
> <video src="http://example.com/video.ogv" controls>
>  <text category="CC" lang="en" type="text/x-srt"
> src="caption.srt"></text>  <text category="SUB" lang="de"
> type="application/ttaf+xml"
> src="german.dfxp"></text>
>  <text category="SUB" lang="jp" type="application/smil"
> src="japanese.smil"></text>
>  <text category="SUB" lang="fr" type="text/x-srt"
> src="translation_webservice/fr/caption.srt"></text>
> </video>
>
> * "text" elements are subelements of the "video" element and therefore
> clearly related to one video (even if it comes in different formats).
> [BTW: I'm happy to rename this to textarea or whatever else people
> prefer to call it].
>
> * the "category" tag (could also be renamed "role" if we prefer)
> allows us to specify what text category we are dealing with and allows
> the web browser to determine how to display it (there would be default
> display for the different categories and css would allow to override
> these).
>
> * the "lang" tag would allow the specification of alternative
> resources based on language, which allows the browser to select one by
> default based on browser preferences, and also to turn those tracks on
> by default that a particular user requires (e.g. because they are
> blind and have preset the browser accordingly)
>
> * the "type" tag allows specification of what actual time-aligned text
> format is being used in this instance; again, it will allow the
> browser to determine whether it is able to decode the file and thus
> make it availalbe through an interface or not.
>
> * the "src" attribute obviously points to the time-aligned text
> resource. This could be a file, a script that extracts data from a
> database, or even a web service that dynamically creates the data
> based on some input.
>
> This provides for a lot of flexibility and is somewhat independent of
> the media file format, while still enabling the Web browser to deal
> with the text (as long as it can decode it).
>
> What do people think?
>
> Regards,
> Silvia.
>
> BTW: We are in parallel working on getting time-aligned text support
> into Ogg - see the spec at http://wiki.xiph.org/index.php/OggText . It
> will provide a similarly flexible approach for any kind of text format
> as this element does. This means that mapping into the DOM would work
> in a similar way from within Ogg as it would from a "text" element as
> defined above.

Silvia's message is archived at:
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-December/017732.html

Best Regards,
Laura
-- 
Laura L. Carlson