- From: Laura Carlson <laura.lee.carlson@gmail.com>
- Date: Tue, 9 Dec 2008 06:58:32 -0600
- To: "W3C WAI-XTECH" <wai-xtech@w3.org>
- Cc: "Silvia Pfeiffer" <silviapfeiffer1@gmail.com>
For WAI-XTECH and PF's information the following is a message on HTML5 video accessibility from Silvia Pfeiffer. It may be of interest in formulating a response to http://lists.w3.org/Archives/Public/public-html/2008Sep/0421.html Silvia wrote: > For the last 2 months, I have been investigating means of satisfying > video accessibility needs through Ogg in Mozilla/Firefox for HTML5. > > You will find a lot of information about our work at > https://wiki.mozilla.org/Accessibility/Video_Accessibility and in the > archives of the Ogg accessibility mailing list at > http://lists.xiph.org/mailman/listinfo/accessibility . > > I wanted to give some feedback here on our findings, since some of > them will have an impact on the HTML5 specification. > > > What are we talking about > ----------------------------------- > When I say "video accessibility", I'm actually only talking about > time-aligned text formats and not e.g. captions as bitmaps or audio > annotations as wave files. > Since we analysed how to attach time-aligned text formats with video > in a Web Browser, we also did not want to restrict ourselves to only > closed captions and subtitles. > It made sense to extend this to any type of time-aligned text on can > think about, including textual audio annotations (to be consumed by > the blind through a screenreader or braille output), karaoke, speech > bubbles, hyperlinked text annotations, and others. There is a list at > http://wiki.xiph.org/index.php/OggText#Categories_of_Text_Codecs which > gives you a more complete picture. > > > How is it currently done > ------------------------------- > When looking at the existing situation around time-aligned text for > video, I found a very diverse set ot formats and means of doing it. > > First of all, most media players allow you to load a video file and a > caption/subtitle file for it in two separate steps. The reason is that > most subtitles are produced by other people than the original content > and this allows the player to synchronise them together. This is > particularly the case with the vast majority of SRT and SUB subtitle > files, but is also the case for SMIL- and DFXP-based subtitle files. > >> From a media file format POV, some formats have a means of > multiplexing time-aligned text into the format, e.g. QuickTime has > QTText and Flash has cuepoints. Others prefer to use external > references, e.g. WindowsMedia and SAMI or SMIL files, RealMedia and > SMIL files. > > For mobile applications, a subset of DFXP has been defined in 3GPP > TimedText, which is actually being encapsulated into QuickTime QTText > using some extensions, and can be encapsulated into MP4 using the > MPEG-4 TTXT specification. > > As can be seen, the current situation is such that time-aligned text > is being handled both in-stream and out-of-band and there are indeed > requirements for both situations. > > > Requirements > ------------------- > Not to go into much detail here, but I have seen extensive arguments > made on both sides of the equation for and against in-stream text > tracks. > One particular argument for in-stream text is that of downloading the > video from some place and keeping all its information together in one > file such that when it is distributed again, it retains that > information. > One particular argument for out-of-band text is the ability to add > text tracks at a later stage, from another site, and even from a web > service (e.g. a translation web service that uses an existing caption > file and translates it into another language). > In view of these requirements, I strongly believe we need to enable > people to do both: provide time-aligned text through > external/out-of-band resources and through in-stream, where the > container format allows this. > > > Proposal for out-of-band approach > ---------------------------------------------- > I'd like to stimulate a discussion here about how we can support > out-of-band time-aligned text for video in HTML5. > I have seen previous proposals, such as the "track" element at > http://esw.w3.org/topic/HTML/MultimediaAccessibilty#head-a83ba3666e7a > 437bf966c6bb210cec392dc6ca53 and would like to propose the following > specification. > > Take this as an example: > > <video src="http://example.com/video.ogv" controls> > <text category="CC" lang="en" type="text/x-srt" > src="caption.srt"></text> <text category="SUB" lang="de" > type="application/ttaf+xml" > src="german.dfxp"></text> > <text category="SUB" lang="jp" type="application/smil" > src="japanese.smil"></text> > <text category="SUB" lang="fr" type="text/x-srt" > src="translation_webservice/fr/caption.srt"></text> > </video> > > * "text" elements are subelements of the "video" element and therefore > clearly related to one video (even if it comes in different formats). > [BTW: I'm happy to rename this to textarea or whatever else people > prefer to call it]. > > * the "category" tag (could also be renamed "role" if we prefer) > allows us to specify what text category we are dealing with and allows > the web browser to determine how to display it (there would be default > display for the different categories and css would allow to override > these). > > * the "lang" tag would allow the specification of alternative > resources based on language, which allows the browser to select one by > default based on browser preferences, and also to turn those tracks on > by default that a particular user requires (e.g. because they are > blind and have preset the browser accordingly) > > * the "type" tag allows specification of what actual time-aligned text > format is being used in this instance; again, it will allow the > browser to determine whether it is able to decode the file and thus > make it availalbe through an interface or not. > > * the "src" attribute obviously points to the time-aligned text > resource. This could be a file, a script that extracts data from a > database, or even a web service that dynamically creates the data > based on some input. > > This provides for a lot of flexibility and is somewhat independent of > the media file format, while still enabling the Web browser to deal > with the text (as long as it can decode it). > > What do people think? > > Regards, > Silvia. > > BTW: We are in parallel working on getting time-aligned text support > into Ogg - see the spec at http://wiki.xiph.org/index.php/OggText . It > will provide a similarly flexible approach for any kind of text format > as this element does. This means that mapping into the DOM would work > in a similar way from within Ogg as it would from a "text" element as > defined above. Silvia's message is archived at: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-December/017732.html Best Regards, Laura -- Laura L. Carlson
Received on Tuesday, 9 December 2008 12:59:08 UTC