Proposal for Audio and Video Track Selection and Synchronisation for Media Elements


This is a change proposal for ISSUE-152, introducing an API for media 
elements to allows Web authors to provide alternate modes of presentation 
for media resources with multiple tracks, including allowing the user to 
select between them, and allowing multiple tracks to be played 
simultaneously and in a synchronised fashion.


These are the use cases that drive this proposal:

 * Enabling authors to control the rendering of sign-language tracks
   embedded in a video file or provided as independent resources.

 * Enabling authors to control the rendering of recorded audio
   description tracks embedded in a video file or provided as
   independent resources.

 * Enabling authors to control the rendering of alternative audio
   (director's commentary) tracks embedded in a video file or provided
   as independent resources.

 * Enabling authors to provide features such as
   with synchnorisation, including for cases where the two media
   resources are to be synchronised with different starting offsets or
   different playback rates.

 * Enabling authors to have short loops (e.g. a metronome sound) play
   over a longer track (e.g. a song), keeping the two tightly
   synchronised even if the longer file does stalls while playing (due
   to network congestion).

 * Allowing authors to select specific dubbed audio tracks based on
   the language of the track.

 * Enabling the user to make use of "pause", "fast-forward", "rewind",
   "seek", "volume", and "mute" features in the above cases.

 * Allowing authors to use CSS for presentation control, e.g. to
   control where multiple video channels are to be placed relative to
   each other.

 * Allowing authors to both individually control the volume of each
   track, and control the overall volume keeping the relative values


  This shows how an author could declaratively render a file with
  embedded sign-language tracks with no script (note that the file is
  only fetched from the network once, due to the pooled-downloads
  feature of the 'fetch' algorithm):

   <style scoped>
    div { margin: 1em auto; position: relative; width: 400px; height: 300px; }
    video { position; absolute; bottom: 0; right: 0; }
    video:first-child { width: 100%; height: 100%; }
    video:last-child { width: 30%; }
    <video src="movie.vid#track=Video&amp;track=English" autoplay controls mediagroup=movie></video>
    <video src="movie.vid#track=sign" autoplay mediagroup=movie></video>


In addition to a number of other features, this change proposal
includes all the features described in

...with more or less the same API. The differences between that
proposal and the equivalent parts of this one are only intended to
help make the API more consistent in its handling of text, audio, and
video tracks.

This change proposal covers all the use cases and the majority of the
requirements and "side conditions" listed in:

It also reflects the suggestions made in this bug:

...with one exception, namely it does not explicitly suggest that user
agents allow authors to enable or disable tracks in the manner
described in this proposal (though the specification would still
recommend that user agents allow users to enable and disable tracks,
as it does now). This difference is because exposing this feature
conflicts with the requirements listed in the earlier cited document
(namely to have the same API for embedded and external resources).


Implement the changes in this diff that are currently marked out with
<!--CONTROLLER--> comments:

The bulk of the text can easily be seen in the WHATWG HTML spec:

The remainder consists of changes to the processing models described
in the media element section to make them work with the features
defined above.


 * Enables a whole raft of use cases with a minimal API.

 * Adds a new feature, which means an increase in spec complexity.

 * Impacts any script-enabled interactive user agent.

 * It's possible that it is still too early for us to be adding any
   kind of multi-track feature given the current implementation
   priorities of user agents.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Monday, 21 March 2011 11:01:14 UTC