Re: [media] handling multitrack audio / video

On Sat, Dec 4, 2010 at 8:35 AM, Maciej Stachowiak <mjs@apple.com> wrote:
>
> On Dec 3, 2010, at 12:31 PM, Silvia Pfeiffer wrote:
>
>>
>> Now there's only one other major use case that we haven't considered
>> yet: the case where we have completely alternative resources to the
>> main audio/video that contain signing/descriptions/captions/subtitles.
>> So we have, for example:
>>
>> * v_main.{ogv,mp4,webm} which has main audio & main video
>> * v_main_sign.{ogv,mp4,webm} which has main audio & main video & sign
>> language video & captions - being particularly targeted at the HoH
>> * v_main_desc.{ogv,mp4,webm} which has main audio & main video & audio
>> description - being particularly targeted at the VI
>> * plus all of this in different languages
>>
>> I think this is also a major use case that we will find, in particular
>> where the auxiliary accessibility content has been burnt-in.
>>
>> I almost think that in this case we have to provide completely
>> alternative <video> elements where only one of them is allowed to be
>> active. This is where a manifest may come in handy.
>
> This case is challenging, because you need to find a supported format, but want best-fit, not first-fit, on accessibility and language aspects.
>
> Just thinking through how that work work with <source> elements and a kind-like attribute, I could imagine this:
>
> - Source selection algorithm becomes two-pass.
> - First pass stable sorts the <source> elements according to a best-fit metric of accessibility affordances, relative to user preferences (this metric would have to be defined). It's a stable sort so that elements are not reoderdered if they are equally good fits.
> - The second pass is the usual source-selection algorithm.
>
> This will always result in getting the best accessibility fit out of sources with a supported format.
>
> We would also have to be able ti distinguish optional accommodations vs. burned-in ones. Burned-in accommodations (e.g. burned-in captions or a sign language translation combined with the main video track) should score negatively for users who don't want that accommodation, but optional ones (e.g. built-in but optional closed caption track, or built-in but separate descriptive audio track) do not score positively or negatively.
>
> Whether this is viable depends on whether we can come up with a good scoring rule for matching available accommodations to user needs.


Just to throw another dimension into this: HTTP adaptive streaming
will have the same issue. You might want to refer to the thoughts at
http://wiki.whatwg.org/wiki/Adaptive_Streaming for some background on
this. There will be a number of media resources that are alternatives
to each other based on bitrate, framerate and possibly width/height
and other attributes (Apple's Live Streaming already deals with this
through m3u8). Those other attributes for the media resource
alternatives could IMO include the accessibility tracks. Then we don't
need to invent a new means for switching complete resources - they are
all part of the same selection algorithm. The adaptation just has to
take into account also the available tracks in a resources rather than
just the quality of service metrics.

It essentially ends up doing exactly what you are suggesting: it
creates a two-way source selection algorithm. However, the first
choice - the choice of supported encoding format - is one that is not
dynamic. The second choice - the one of quality of file and available
accessibility tracks - needs to be dynamic, in particular if a user
decides half-way through watching a resource that they want to turn on
available caption tracks. We would have a m3u8-type file for each file
format (mp4/ogv/webm), then the first choice can continue to be done
in the browser as before.

In this case, the browser would go through the list of available
resources in a m3u8-type manifest file to determine what alternative
resources with what accessibility tracks it has available and include
them into the user menu. Then start downloading the most appropriate
one given the current user preference settings and quality of service
parameters.

Cheers,
Silvia.

Received on Saturday, 4 December 2010 01:51:53 UTC