Re: Tech Discussions on the Multitrack Media (issue-152) from David Singer on 2011-03-03 (public-html@w3.org from March 2011)

From: David Singer <singer@apple.com>
Date: Thu, 3 Mar 2011 15:26:50 -0800
To: Sean Hayes <Sean.Hayes@microsoft.com>
Cc: Silvia Pfeiffer <silviapfeiffer1@gmail.com>, Eric Carlson <eric.carlson@apple.com>, public-html <public-html@w3.org>
Message-Id: <F474BA32-6888-4C1F-8ADD-C1AF9451C0A2@apple.com>
On Mar 3, 2011, at 6:36 , Sean Hayes wrote:

> Replacement is really the only practical method of achieving something like http://www.w3.org/WAI/PF/HTML/wiki/File:BBC_iPlayer.PNG, at least until video frame manipulation through canvas becomes fast enough, or css supports quality chromakey blending of video. Using a separate rectangular video area hides to much of the source if layered over the main video, and a side by side arrangement takes up a lot of space and doesn't work for full screen video. Also the main source of audio describe video is likely to be from TV/DVD where extended descriptions are not used and pre-mixed audio is the norm. In addition, for mobile devices client side mixing of stream is a ways off. So yes I think replacement is definitely on the 80% use case side.
> 


OK, so here is a strawman stab on how to handle content re-authrored to meet a need.

         I am (obviously) open to refinement and critique.


OVERVIEW

We achieve replacement through three (minor) changes:

a) allowing accessibility tags on the <source>s of the primary audio/video tag
b) changing the source selection algorithm slightly (and this only has an effect for the accessibility-desiring user)
c) [not really directly related, but a consequence] make the tagging of a source/track be permitted to be a list, not just a single keyword, and test for membership rather than equality.

Here's the justification, the changes, and some comments.


BACKGROUND

Some kinds of accessibility adaptation necessarily involve re-authoring the primary resource, not adding something optional:
* repetitive stimulus avoidance
* contrast adjustment (to high or low) of video or audio (e.g. high-contrast audio has the background music and effects turned down, to enhance clarity of the speech etc.)
* color-blindness compensation

Some kinds may be much more easily met with by re-authoring than by side-files:
* audio description of video, where the original audio and/or the timing need adjustment to allow room for the descriptive audio
* sign-language overlays, especially where the signer is alpha-blended and/or moves around, on the video

Sometimes one is forced into an alternative primary resource:
* burned-in captions are the only ones available

Sometimes it's easier to do content management etc. 
* when there is 'one file for the resource' (e.g. with an optional track inside the media container for the accessibility need).


EXTENDED ACCESSIBILITY TAGGING

For a while I have felt we can easily address this by allowing:
a) that the <source> elements (or the <video> element itself, if it has only one source or they all have the same accessibility characteristics) can be labelled as 'can meet these accessibility needs' (in today's terms, allow 'kind' tags on <video>, <audio>, and <source> in video and audio)

b) the the same kind of labelling (of kinds) on the tracks *inside* the media container be allowed/used, so we can work out that there is an optional track in the multiplex that meets a certain need


SOURCE SELECTION ALGORITHM

We add to the source selection algorithm as follows, using the existing rules on whether a source is basically suitable at all. 

If the user has any accessibility preferences, either 
a) of the preference form "I prefer if need X is met" or 
b) the requirement form "I can only watch material where need X is met") 

then the suitable sources are examined, accumulating the best match so far, until either (a) the end of the list or (b) a perfect match is met.  A source matches if all requirements are met. A match is better than another match if it meets more preferences; it is perfect if it meets all preferences (as well as the requirements). Tags on the sources which the user has expressed no preference about are ignored.  (Note that this therefore reduces to today's algorithm in the case that the user has an empty preference list.)


'KIND' LISTS

Since some sources will meet more than one need (e.g. a source that both carries captions and is safe from repetitive-stimulus), the kind might have more than one keyword.  So the matching algorithm should not be track.kind=="<keyword>" but (using an appropriate idiom for matching a keyword in a list), contains(track.kind, "<keyword>")


TRACK ITERATION

We make sure that the APIs that iterate over tracks can (or maybe always) do the iteration over a union of the tracks inside any multiplex files, as well.  So if this is written
<video src="mymovie.mp4" />
<track src="mycaptions.vtt" lang="en" />

and mymovie.mp4 has an audio and video track and a video track tagged with "signings", then the track iteration over the tracks in the video will see the main video, main audio, signings video, and the captions from mycaptions.vtt.  It can then enable/disable them as desired and needed.


EXAMPLES

So one might see
<video id="v1" poster=“video.png” controls>
     <!-- primary content -->
     <source src=“video.webm” type=”video/webm”>
     <source src=“video.mp4” type=”video/mp4”>
     <source src=“video.mp4” type=”video/mp4” kind="captions">

     <!-- sign language overlay -->
     <track kind="signings" srclang="asl" label="American Sign Language">
         <source src="signlang.webm" type="video/webm">
         <source src="signlang.mp4" type="video/mp4">
     </track>

 </video>

and then this code should 'just work':
for (var track in video.tracks) {
  if (track.mediaType == "video" && contains(track.kind, "signings") && track.language == "asl") {
      track.mode = SHOWING;
      break;
  }
}

(I know, this example doesn't show a caption possibility for the webM viewer, but it is just a short example). 


DISCUSSION OF HTTP STREAMING

An MPEG/3G DASH manifest can include optional 'tracks' (called representations there) which could satisfy needs, so if all the material is HTTP-based-streaming delivered, one might see an HTML declaration

<video src="manifest.mpd" kind="captions signings audio-desc repetitive-stimulus-safe" />

and have the javascript and DASH engine do the configuring of the representations in the manifest.  The declaration at the HTML5 level is then serving to notify the HTML5 engine that the manifest is suitable, e.g. for someone who requires repetitive-stimulus avoidance.

(I think we would all like to see DASH manifests, and other places carrying optional tracks for accessibility needs, use the same tagging system).


David Singer
Multimedia and Software Standards, Apple Inc.
Received on Thursday, 3 March 2011 23:27:25 UTC