Audio Track Selecction for Media Element

Summary

This is a change proposal for Issue 152, introducing a JavaScript API for HTML5 media elements that allows Web authors to provide alternate modes of presentation for a media presentation and allow selection between them by the end user.

It is a minimal extension to the existing API in that it does not provide detailed access to the media tracks themselves, but merely provides a means of indicating their presence and a means of selecting between the presentation modes.

Rationale

The HTML5 video element specification right now states (bold parts added):

To make video and audio content accessible to people who are blind, deaf, or have other physical or cognitive disabilities, authors are expected to provide alternative media streams and/or to embed accessibility aids (such as caption or subtitle tracks, audio description tracks, or sign-language overlays) into their media streams.
User agents should provide controls to enable or disable the display of closed captions, audio description tracks, and other additional data associated with the video stream, though such features should, again, not interfere with the page's normal rendering.
If the controls attribute is present, or if scripting is disabled for the media element, then the user agent should expose a user interface to the user. This user interface should include features to begin playback, pause playback, seek to an arbitrary position in the content (if the content supports arbitrary seeking), change the volume, change the display of closed captions or embedded sign-language tracks, select different audio tracks or turn on audio descriptions, and show the media content in manners more suitable to the user (e.g. full-screen video or in an independent resizable window). Other controls may also be made available.

The specification currently includes mechanisms to access text based track resources either as external to the media or internal, although there is only experimental support for internal text tracks, and no released browser implements the external means yet, there is consensus that this is an appropriate approach.

No released browsers that support the media elements however expose controls to the page author for media based accessibility data that is contained inside the media resources. Perhaps the ideal approach to additional media would be to extend the text track mechanism, however this is problematic for external media tracks, since the level of synchronisation it would imply between the external track and the internal track would require a sophisticated media engine, and no browser vendor has indicated a desire to adopt such a system. While it would be possible to have the extensions work only on tracks internal to the media (where synchronisation is readily achievable using the existing media frameworks), this would be confusing to authors.

This proposal introduces a simpler approach based on an extension to the existing HTMLMediaElement API for determining whether there are alternate audio tracks contained inside a media resource based on the intended mode of presentation those tracks (e.g. audio descriptions, alternative language tracks); and to allow page author to publish the corresponding media content and provide the means to select between the audio tracks.

It also encourages browser developers to expose UI in the default players to select alternate tracks in multitrack media resources.

Draft Proposal

The HTMLMediaElement currently has the following Interface:

 interface HTMLMediaElement : HTMLElement {

  // error state
  readonly attribute MediaError error;

  // network state
           attribute DOMString src;
  readonly attribute DOMString currentSrc;
  const unsigned short NETWORK_EMPTY = 0;
  const unsigned short NETWORK_IDLE = 1;
  const unsigned short NETWORK_LOADING = 2;
  const unsigned short NETWORK_NO_SOURCE = 3;
  readonly attribute unsigned short networkState;
           attribute DOMString preload;
  readonly attribute TimeRanges buffered;
  void load();
  DOMString canPlayType(in DOMString type);

  // ready state
  const unsigned short HAVE_NOTHING = 0;
  const unsigned short HAVE_METADATA = 1;
  const unsigned short HAVE_CURRENT_DATA = 2;
  const unsigned short HAVE_FUTURE_DATA = 3;
  const unsigned short HAVE_ENOUGH_DATA = 4;
  readonly attribute unsigned short readyState;
  readonly attribute boolean seeking;

  // playback state
           attribute double currentTime;
  readonly attribute double initialTime;
  readonly attribute double duration;
  readonly attribute Date startOffsetTime;
  readonly attribute boolean paused;
           attribute double defaultPlaybackRate;
           attribute double playbackRate;
  readonly attribute TimeRanges played;
  readonly attribute TimeRanges seekable;
  readonly attribute boolean ended;
           attribute boolean autoplay;
           attribute boolean loop;
  void play();
  void pause();

  // controls
           attribute boolean controls;
           attribute double volume;
           attribute boolean muted;

  // text tracks
  readonly attribute TextTrack[] tracks;
  MutableTextTrack addTrack(in DOMString kind, in optional DOMString label, in optional DOMString language);

};

The change proposal is to extend this interface in the following manner:

 interface HTMLMediaElement : HTMLElement {

  // error state
  readonly attribute MediaError error;

  // network state
           attribute DOMString src;
  readonly attribute DOMString currentSrc;
  const unsigned short NETWORK_EMPTY = 0;
  const unsigned short NETWORK_IDLE = 1;
  const unsigned short NETWORK_LOADING = 2;
  const unsigned short NETWORK_NO_SOURCE = 3;
  readonly attribute unsigned short networkState;
           attribute DOMString preload;
  readonly attribute TimeRanges buffered;
  void load();
  DOMString canPlayType(in DOMString type);

  // ready state
  const unsigned short HAVE_NOTHING = 0;
  const unsigned short HAVE_METADATA = 1;
  const unsigned short HAVE_CURRENT_DATA = 2;
  const unsigned short HAVE_FUTURE_DATA = 3;
  const unsigned short HAVE_ENOUGH_DATA = 4;
  readonly attribute unsigned short readyState;
  readonly attribute boolean seeking;

  // playback state
           attribute double currentTime;
  readonly attribute double initialTime;
  readonly attribute double duration;
  readonly attribute Date startOffsetTime;
  readonly attribute boolean paused;
           attribute double defaultPlaybackRate;
           attribute double playbackRate;
  readonly attribute TimeRanges played;
  readonly attribute TimeRanges seekable;
  readonly attribute boolean ended;
           attribute boolean autoplay;
           attribute boolean loop;
  void play();
  void pause();

  // controls
           attribute boolean controls;
           attribute double volume;
           attribute boolean muted;

  // text tracks
  readonly attribute TextTrack[] tracks;
  MutableTextTrack addTrack(in DOMString kind, in optional DOMString label, in optional DOMString language);

  // audio tracks
  readonly attribute unsigned long audioTrackCount;
  readonly attribute DOMString audioTrackLanguage[];
           attribute unsigned long currentAudioTrack;

};

The numberAudioTracks attribute represent number of audio tracks embedded in the the media resource set to the media element. The audioTrackLang attribute represent the language [BCP47] of each of the audio tracks based on a zero-based index. The selectedAudioTrack attribute represent the index of the audio track curently being selected for the media element.

When the currentAudioTrack selection is changed, the user agent must queue a task to fire a simple event named "audiotrackchange".

If the author selects an index out of the range of what is allowed by the numberAudioTracks attribute, the UA throws an INDEX_SIZE_ERR exception.

Example use

  // select audio track with English language as in United Kingdom

  for(var i = 0; i < videoElement.audioTrackCount; i++){
      if(videoElement.audioTrackLanguage[i] == "en-UK"){
          videoElement.currentAudioTrack = i;
          break;
      }
  }

Impact

This is a minimal extension to the existing media element which allows support for media accessibility without undue burden on the browser vendors. This mechanism does not preclude a richer API being defined in the future, for example based on the track mechanisms being adopted for text tracks.

Positive Effects

Negative Effects

References

HTML Working Group Issue

HTML ISSUE-152 Handling of additional tracks of a multitrack audio/video resource

Related Bugs

Bug 9452: Handling of additional tracks of a multitrack audio/video resource

Bug 8659: Media events to indicate captions and audio descriptions

Bug 5758: Insufficient accessibility fallback for <audio> or <video>