Re: Tech Discussions on the Multitrack Media (issue-152)

I have several comments on the proposal alternatives on the wiki (which have been very informative). As a first poster let me introduce myself - I represent CableLabs, where we've been analyzing commercial video service provider requirements and how HTML5, and timed text tracks and multimedia tracks, can be used to meet those requirements.

Overloading the existing track element representing Timed Text Tracks for media tracks would mix two fundamentally different models. Timed Text Tracks have cues with substantially different semantics than continuous media tracks. Side condition 8 notes this. I think it's a good idea to keep Timed Text Tracks separate from continuous audio and video tracks. This would seem to rule out 1), 2) and 7).

We've been experimenting with using @kind=metadata timed text tracks for a variety of applications and it would be helpful to be able to distinguish between different @kind=metadata types. This is important to keep the in-band and out-of-band markup the same. Having a <track> @type attribute would permit this.

The additional vs alternate semantics for media tracks is interesting. It seems that more than one video track implies that the second video track is playing in addition to the primary. Each should be in a separate window, but only one set of controls (associated with the primary video). How the two windows are displayed should be up to the application because, in general, the user agent won't have enough information, e.g. should the signing be superimposed in the bottom right corner of the primary vs off screen, what size, etc (it might be possible for the user agent to be told how to position multiple video windows but a general a solution to this is TBD). Audio might be merged by the user agent into a single stream, or an alternate audio track might replace the primary audio - Spanish vs English track for example.

Here's an alternative merging the containment model alternative 3 and alternative 6 with its application access to the audio/media objects that supports the above use cases:

<video id="v1" poster="video.png" controls>
           <source src="video.webm" type="video/webm"> <!-- primary content -->
           <source src="video.mp4" type="video/mp4"> <!-- primary content -->
           <track kind="captions" srclang="en" src="captions.vtt">

<audio kind="descriptions" srclang="en"> <!-- pre-recorded audio descriptions -->
                        <source src="description.ogg" type="audio/ogg" label="English Audio Description">
                        <source src="description.mp3" type="audio/mp3">

<audio kind="alternate" srclang="sp"> <!- Spanish alternative audio -->
                        <source src="spaudio.ogg" type="audio/ogg" label="Spanish audio">
                        <source src=" spaudio.mp3" type="audio/mp3">

<audio kind="descriptions" srclang="sp"> <!-- pre-recorded audio descriptions in Spanish-->
                        <source src="spdescription.ogg" type="audio/ogg" label="Spanish Audio Description">
                        <source src="spdescription.mp3" type="audio/mp3">

<video kind="signings" srclang="asl" label="American Sign Language">  <!-- sign language overlay -->
                        <source src="signing.webm" type="video/webm">
                        <source src="signing.mp4" type="video/mp4">

<video kind="alternate" label="Alternate Camera 1">
                        <source src="alternate-camera-1.webm" type="video/webm">
                        <source src="alternate-camera-1.mp4" type="video/mp4">


English audio descriptions would be enabled like this:

for (i in {
if ([i].kind == "descriptions" &&[i].language == "en") {[i].mode = SHOWING;

Spanish audio track with audio descriptions would be enabled like this:


for (i in {
if ([i].kind == "alternate" &&[i].language == "sp") {[i].mode = SHOWING;

for (i in {
if ([i].kind == "descriptions" &&[i].language == "sp") {[i].mode = SHOWING;

Last, many existing commercial video providers will offer live streaming services. We expect that the presence of timed text tracks, and alternate audio and video tracks will vary over time depending on the content in the stream. Therefore, the tracks will be discovered in-band and can also be expected to disappear.

Bob Lund

Received on Thursday, 17 February 2011 21:14:23 UTC