- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Mon, 14 Feb 2011 07:39:32 +1100
- To: public-html <public-html@w3.org>
- Cc: Jeroen Wijering <jeroen@longtailvideo.com>
Here is some feedback that I received privately from Jeroen Wijering (JWPlayer & Longtail Video) on the Multitrack API. He is trying to get subscribed to public-html and is already on the W3C Web and TV Interest Group (see http://www.w3.org/2011/webtv/), which recently had a workshop in Berlin (see http://www.w3.org/2010/11/web-and-tv/). One of the key discussion points at that workshop were multitrack media, adaptive HTTP streaming, and the MPEG solution for a manifest format for these called DASH. Here is what Jeroen writes: The use case is spot on; this is an issue that blocks HTML5 video from being chosen over a solution like Flash. An elaborate list of tracks is important, to correctly scope the conditions / resolutions: 1. Tracks targeting device capabilities: * Different containers / codes / profiles * Multiview (3D) or surround sound * Playback rights and/or decryption possibilities 2. Tracks targeting content customization: * Alternate viewing angles or alternate music scores * Director's comments or storyboard video 3. Tracks targeting accessibility: * Dubbed audio or text subtitles * Audio descriptions or closed captions * Tracks cleared from cursing / nudity / violence 4. Tracks targeting the interface: * Chapterlists, bookmarks, timed annotations, midroll hints.. * .. and any other type of scripting queues ---- Note I included the HTML5 "text tracks". I believe there are four kinds of tracks, all inherent part of a media presentation. These types designate the output of the track, not its encoded representation: * audio (producing sound) * metadata (producing scripting queues) * text (producing rendered text) * video (producing images) In this taxonomy, the HTML5 "subtitles" and "captions" <track> kinds are text, the "descriptions" kind is audio and the "chapters" and "metadata" kinds are metadata. ---- The requirements are spot on too. Do note they span beyond HTML5. Everything that plays back audio/video needs multitrack support: * Broad- and narrowcasting playback devices of any kind * Native desktop, mobile and settop applications/apps * Devices that play media standalone (mediaplayers, pictureframes, "airplay") Also, on e.g. the iPhone and Android devices, playback of video is triggered by HTML5, but subsequently detached from it. Think about the custom fullscreen controls, the obscuring of all HTML and events/cueues that are deliberately ignored or not sent (such as play() in iOS). With this in mind, I think an additional requirement is that there should be a full solution outside the scope of HTML5. HTML5 has unique capabilities like customization of the layout (CSS) and interaction (JavaScript), but it must not be required. ----- In the side conditions, I'm not sure on the relative volume of audio or positioning of video. Automation by default might work better and requires no parameters. For audio, blending can be done through a ducking mechanism (like the JW Player does). For video, blending can be done through an alpha channel. At a later stage, an API/heuristics for PIP support and gain control can be added. ---- In terms of solutions, I lean much towards the manifest approach. All other approaches are variations on the theme of adding more elements to HTML5, which: * Won't work for situations outside of HTML5. * Clash with the addition of manifests. Without a manifest, there'll probably be no adaptive streaming, which renders HTML5 video much less useful. At the same time, standardization around manifests (DASH) is largely wrapping up. ---- Here's an update on the manifest approach. First the HTML5 side: {{{ <video id="v1" poster="video.png" controls> <source src="manifest.xml" type="video/mpeg-dash"> </video> }}} Second the manifest side: {{{ <MPD mediaPresentationDuration="PT645S" type="OnDemand"> <BaseURL>http://cdn.example.com/myVideo/</BaseURL> <Period> <Group mimeType="video/webm" lang="en"> <Representation sourceURL="video-1600.webm" /> </Group> <Group mimeType="video/mp4; codecs=avc1.42E00C,mp4a.40.2" lang="en"> <Representation sourceURL="video-1600.webm" /> </Group> <Group mimeType="text/vvt" lang="en"> <Accessibility type="CC" /> <Representation sourceURL="captions.vtt" /> </Group> </Period> </MPD> }}} (I will more look into accessibility parameters, but there is support for signalling captions, audiodescriptions, sign language etc.) Note that this approach moves the text track outside of HTML5, making it accessible for other clients as well. Both codecs are also in the manifest - this is just one of the device capability selectors of DASH clients. With the manifest issue "resolved", two disadvantages remain. ---- The CSS styling issue can be fixed by making a conceptual change to CSS and text tracks. Instead of styling text tracks, a single "text rendering area" for each video element can be exposed and styled. Any text tracks that are enabled push data in it, which is automatically styled according to the video.textStyle/etc rules. ---- Discoverability is indeed an issue, but this can be fixed by defining a track API, which a merger of the currently drafted text track API and the proposed media track API: {{{ interface Track { readonly attribute DOMString kind; readonly attribute DOMString label; readonly attribute DOMString language; const unsigned short OFF = 0; const unsigned short HIDDEN = 1; const unsigned short SHOWING = 2; attribute unsigned short mode; } interface HTMLMediaElement : HTMLElement { [...] readonly attribute Track[] tracks; }; }}} On Thu, Feb 10, 2011 at 10:56 AM, Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote: > Everyone, > > Your input on this is requested. > > Issue-152 is asking for change proposals for a solution for media > resources that have more than just one audio and one video track > associated with them. The spec addresses this need for text tracks > such as captions and subtitles only [1]. But we haven't solved this > problem for additional audio and video tracks such as audio > descriptions, sign language video, and dubbed audio tracks. > > In the accessibility task force we have discussed different options > over the last months. However, the number of people that provide > technical input on issues related to media in the TF is fairly > limited, so we have decided to use the available time until a change > proposal for issue-152 is due (21st February [2]) to open the > discussion to the larger HTML working group with the hope of hearing > more opinions. > > Past accessibility task force discussions [3][4] have exposed a number > of possible markup/API solutions. > > The different approaches are listed at > http://www.w3.org/WAI/PF/HTML/wiki/Media_Multitrack_Media_API . This > may be an incomplete list, but it's a start. If you have any better > ideas, do speak up. > > Which approach do people favor and why? > > Cheers, > Silvia. > > [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#the-track-element > [2] http://lists.w3.org/Archives/Public/public-html/2011Jan/0198.html > [3] http://lists.w3.org/Archives/Public/public-html-a11y/2010Oct/0520.html > [4] http://lists.w3.org/Archives/Public/public-html-a11y/2011Feb/0057.html >
Received on Sunday, 13 February 2011 20:40:26 UTC