- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Tue, 9 Dec 2008 12:27:08 +1100
Hi everybody, For the last 2 months, I have been investigating means of satisfying video accessibility needs through Ogg in Mozilla/Firefox for HTML5. You will find a lot of information about our work at https://wiki.mozilla.org/Accessibility/Video_Accessibility and in the archives of the Ogg accessibility mailing list at http://lists.xiph.org/mailman/listinfo/accessibility . I wanted to give some feedback here on our findings, since some of them will have an impact on the HTML5 specification. What are we talking about ----------------------------------- When I say "video accessibility", I'm actually only talking about time-aligned text formats and not e.g. captions as bitmaps or audio annotations as wave files. Since we analysed how to attach time-aligned text formats with video in a Web Browser, we also did not want to restrict ourselves to only closed captions and subtitles. It made sense to extend this to any type of time-aligned text on can think about, including textual audio annotations (to be consumed by the blind through a screenreader or braille output), karaoke, speech bubbles, hyperlinked text annotations, and others. There is a list at http://wiki.xiph.org/index.php/OggText#Categories_of_Text_Codecs which gives you a more complete picture. How is it currently done ------------------------------- When looking at the existing situation around time-aligned text for video, I found a very diverse set ot formats and means of doing it. First of all, most media players allow you to load a video file and a caption/subtitle file for it in two separate steps. The reason is that most subtitles are produced by other people than the original content and this allows the player to synchronise them together. This is particularly the case with the vast majority of SRT and SUB subtitle files, but is also the case for SMIL- and DFXP-based subtitle files. >From a media file format POV, some formats have a means of multiplexing time-aligned text into the format, e.g. QuickTime has QTText and Flash has cuepoints. Others prefer to use external references, e.g. WindowsMedia and SAMI or SMIL files, RealMedia and SMIL files. For mobile applications, a subset of DFXP has been defined in 3GPP TimedText, which is actually being encapsulated into QuickTime QTText using some extensions, and can be encapsulated into MP4 using the MPEG-4 TTXT specification. As can be seen, the current situation is such that time-aligned text is being handled both in-stream and out-of-band and there are indeed requirements for both situations. Requirements ------------------- Not to go into much detail here, but I have seen extensive arguments made on both sides of the equation for and against in-stream text tracks. One particular argument for in-stream text is that of downloading the video from some place and keeping all its information together in one file such that when it is distributed again, it retains that information. One particular argument for out-of-band text is the ability to add text tracks at a later stage, from another site, and even from a web service (e.g. a translation web service that uses an existing caption file and translates it into another language). In view of these requirements, I strongly believe we need to enable people to do both: provide time-aligned text through external/out-of-band resources and through in-stream, where the container format allows this. Proposal for out-of-band approach ---------------------------------------------- I'd like to stimulate a discussion here about how we can support out-of-band time-aligned text for video in HTML5. I have seen previous proposals, such as the "track" element at http://esw.w3.org/topic/HTML/MultimediaAccessibilty#head-a83ba3666e7a437bf966c6bb210cec392dc6ca53 and would like to propose the following specification. Take this as an example: <video src="http://example.com/video.ogv" controls> <text category="CC" lang="en" type="text/x-srt" src="caption.srt"></text> <text category="SUB" lang="de" type="application/ttaf+xml" src="german.dfxp"></text> <text category="SUB" lang="jp" type="application/smil" src="japanese.smil"></text> <text category="SUB" lang="fr" type="text/x-srt" src="translation_webservice/fr/caption.srt"></text> </video> * "text" elements are subelements of the "video" element and therefore clearly related to one video (even if it comes in different formats). [BTW: I'm happy to rename this to textarea or whatever else people prefer to call it]. * the "category" tag (could also be renamed "role" if we prefer) allows us to specify what text category we are dealing with and allows the web browser to determine how to display it (there would be default display for the different categories and css would allow to override these). * the "lang" tag would allow the specification of alternative resources based on language, which allows the browser to select one by default based on browser preferences, and also to turn those tracks on by default that a particular user requires (e.g. because they are blind and have preset the browser accordingly) * the "type" tag allows specification of what actual time-aligned text format is being used in this instance; again, it will allow the browser to determine whether it is able to decode the file and thus make it availalbe through an interface or not. * the "src" attribute obviously points to the time-aligned text resource. This could be a file, a script that extracts data from a database, or even a web service that dynamically creates the data based on some input. This provides for a lot of flexibility and is somewhat independent of the media file format, while still enabling the Web browser to deal with the text (as long as it can decode it). What do people think? Regards, Silvia. BTW: We are in parallel working on getting time-aligned text support into Ogg - see the spec at http://wiki.xiph.org/index.php/OggText . It will provide a similarly flexible approach for any kind of text format as this element does. This means that mapping into the DOM would work in a similar way from within Ogg as it would from a "text" element as defined above.
Received on Monday, 8 December 2008 17:27:08 UTC