Re: Tech Discussions on the Multitrack Media (issue-152) from Silvia Pfeiffer on 2011-02-13 (public-html@w3.org from February 2011)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Mon, 14 Feb 2011 07:39:32 +1100
To: public-html <public-html@w3.org>
Cc: Jeroen Wijering <jeroen@longtailvideo.com>
Message-ID: <AANLkTikX7P49Xi-enafTgbWxrhGcG0Cu9r1YQTo08ZSF@mail.gmail.com>
Here is some feedback that I received privately from Jeroen Wijering
(JWPlayer & Longtail Video) on the Multitrack API. He is trying to get
subscribed to public-html and is already on the W3C Web and TV
Interest Group (see http://www.w3.org/2011/webtv/), which recently had
a workshop in Berlin (see http://www.w3.org/2010/11/web-and-tv/). One
of the key discussion points at that workshop were multitrack media,
adaptive HTTP streaming, and the MPEG solution for a manifest format
for these called DASH.


Here is what Jeroen writes:

The use case is spot on; this is an issue that blocks HTML5 video from
being chosen over a solution like Flash. An elaborate list of tracks
is important, to correctly scope the conditions / resolutions:

1. Tracks targeting device capabilities:
   * Different containers / codes / profiles
   * Multiview (3D) or surround sound
   * Playback rights and/or decryption possibilities
2. Tracks targeting content customization:
   * Alternate viewing angles or alternate music scores
   * Director's comments or storyboard video
3. Tracks targeting accessibility:
   * Dubbed audio or text subtitles
   * Audio descriptions or closed captions
   * Tracks cleared from cursing / nudity / violence
4. Tracks targeting the interface:
   * Chapterlists, bookmarks, timed annotations, midroll hints..
   * .. and any other type of scripting queues


----


Note I included the HTML5 "text tracks". I believe there are four
kinds of tracks, all inherent part of a media presentation. These
types designate the output of the track, not its encoded
representation:

* audio (producing sound)
* metadata (producing scripting queues)
* text (producing rendered text)
* video (producing images)

In this taxonomy, the HTML5 "subtitles" and "captions" <track> kinds
are text, the "descriptions" kind is audio and the "chapters" and
"metadata" kinds are metadata.


----


The requirements are spot on too. Do note they span beyond HTML5.
Everything that plays back audio/video needs multitrack support:

* Broad- and narrowcasting playback devices of any kind
* Native desktop, mobile and settop applications/apps
* Devices that play media standalone (mediaplayers, pictureframes, "airplay")

Also, on e.g. the iPhone and Android devices, playback of video is
triggered by HTML5, but subsequently detached from it. Think about the
custom fullscreen controls, the obscuring of all HTML and
events/cueues that are deliberately ignored or not sent (such as
play() in iOS).

With this in mind, I think an additional requirement is that there
should be a full solution outside the scope of HTML5. HTML5 has unique
capabilities like customization of the layout (CSS) and interaction
(JavaScript), but it must not be required.


-----


In the side conditions, I'm not sure on the relative volume of audio
or positioning of video. Automation by default might work better and
requires no parameters. For audio, blending can be done through a
ducking mechanism (like the JW Player does). For video, blending can
be done through an alpha channel. At a later stage, an API/heuristics
for PIP support and gain control can be added.


----


In terms of solutions, I lean much towards the manifest approach. All
other approaches are variations on the theme of adding more elements
to HTML5, which:

* Won't work for situations outside of HTML5.
* Clash with the addition of manifests.

Without a manifest, there'll probably be no adaptive streaming, which
renders HTML5 video much less useful. At the same time,
standardization around manifests (DASH) is largely wrapping up.


----


Here's an update on the manifest approach. First the HTML5 side:

{{{
<video id="v1" poster="video.png" controls>
  <source src="manifest.xml" type="video/mpeg-dash">
</video>
}}}

Second the manifest side:

{{{
<MPD mediaPresentationDuration="PT645S" type="OnDemand">
    <BaseURL>http://cdn.example.com/myVideo/</BaseURL>
    <Period>

        <Group mimeType="video/webm"  lang="en">
            <Representation sourceURL="video-1600.webm" />
        </Group>

        <Group mimeType="video/mp4; codecs=avc1.42E00C,mp4a.40.2" lang="en">
            <Representation sourceURL="video-1600.webm" />
        </Group>

        <Group mimeType="text/vvt" lang="en">
            <Accessibility type="CC" />
            <Representation sourceURL="captions.vtt" />
        </Group>

    </Period>
</MPD>
}}}

(I will more look into accessibility parameters, but there is support
for signalling captions, audiodescriptions, sign language etc.)

Note that this approach moves the text track outside of HTML5, making
it accessible for other clients as well. Both codecs are also in the
manifest - this is just one of the device capability selectors of DASH
clients.

With the manifest issue "resolved", two disadvantages remain.

----


The CSS styling issue can be fixed by making a conceptual change to
CSS and text tracks. Instead of styling text tracks, a single "text
rendering area" for each video element can be exposed and styled. Any
text tracks that are enabled push data in it, which is automatically
styled according to the video.textStyle/etc rules.


----


Discoverability is indeed an issue, but this can be fixed by defining
a track API, which a merger of the currently drafted text track API
and the proposed media track API:

{{{
interface Track {
  readonly attribute DOMString kind;
  readonly attribute DOMString label;
  readonly attribute DOMString language;

  const unsigned short OFF = 0;
  const unsigned short HIDDEN = 1;
  const unsigned short SHOWING = 2;
  attribute unsigned short mode;

}
interface HTMLMediaElement : HTMLElement {
  [...]
  readonly attribute Track[] tracks;
};
}}}




On Thu, Feb 10, 2011 at 10:56 AM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com> wrote:
> Everyone,
>
> Your input on this is requested.
>
> Issue-152 is asking for change proposals for a solution for media
> resources that have more than just one audio and one video track
> associated with them. The spec addresses this need for text tracks
> such as captions and subtitles only [1]. But we haven't solved this
> problem for additional audio and video tracks such as audio
> descriptions, sign language video, and dubbed audio tracks.
>
> In the accessibility task force we have discussed different options
> over the last months. However, the number of people that provide
> technical input on issues related to media in the TF is fairly
> limited, so we have decided to use the available time until a change
> proposal for issue-152 is due (21st February [2]) to open the
> discussion to the larger HTML working group with the hope of hearing
> more opinions.
>
> Past accessibility task force discussions [3][4] have exposed a number
> of possible markup/API solutions.
>
> The different approaches are listed at
> http://www.w3.org/WAI/PF/HTML/wiki/Media_Multitrack_Media_API . This
> may be an incomplete list, but it's a start. If you have any better
> ideas, do speak up.
>
> Which approach do people favor and why?
>
> Cheers,
> Silvia.
>
> [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#the-track-element
> [2] http://lists.w3.org/Archives/Public/public-html/2011Jan/0198.html
> [3] http://lists.w3.org/Archives/Public/public-html-a11y/2010Oct/0520.html
> [4] http://lists.w3.org/Archives/Public/public-html-a11y/2011Feb/0057.html
>
Received on Sunday, 13 February 2011 20:40:26 UTC