Re: Track kinds


I updated the wiki page at based on our meeting on Wednesday.

The three proposed additional kinds are: captions, subtitles and clearaudio. Please everyone check the definitions on the page. The next decision is whether to re-open the previous "kinds" bug or create a new one.

I also have a procedural question: do we consider that we have received the liaison from 3GPP (mentioned on the above page) ? Are we going to answer it ? One question they ask is whether we will define a URN to identify the space of kind values defined by W3C. One advantage of doing that is that these kinds are then immediately supported in 3GPP and MPEG adaptive streaming manifests, which means that there *is* a media container supporting those kinds (perhaps addressing one of the editor's concerns about new kind values).

Finally, we discussed the "commentary" kind here at Netflix and in the end we are happy to have it dealt with simply as "alternative". I do think though that in principle there could be other (UI-related) reasons for exposing a new track kind than triggering default behavior or application of user preferences. This is certainly the case for accessibility use-cases where the UI to enable/disable a particular track could usefully be tailored to the intended users of that track (for example, enabling/disabling tracks intended for the blind or those with low vision should ideally not involve complex visual UI elements).


On May 3, 2011, at 10:12 PM, Silvia Pfeiffer wrote:

I understand the problem of additive/alternative tracks, too, and have
tried to approach it with markup before. However, I think this is
making something that is supposedly simple much too difficult. The
ultimate choice of active tracks has got to be left to the user. For
this reason, I think @kind (or getKind()) should only ever expose what
content is available in the track, but there should not be an
automatic choice made by the browser. It's up to the user to
activate/deactivate the correct tracks.

Before we dive into anything more complex, we should get some
experience with an implementation of multitrack and the roles. I don't
think we will have much to go by for making a decision beforehand.


On Wed, May 4, 2011 at 2:02 PM, Mark Watson <<>> wrote:
So, if we are looking for a generic approach, where a track can have multiple "roles", then I think the correct logic is indeed to pick the fewest number of tracks which fulfill the intersection of the desired roles and the available roles such that no role is fulfilled more than once. You need a priority list of roles to drop from the desired list if that isn't possible (which would mean some badly authored content, but has to be dealt with). It may be a mouthful, but I think it would be reasonably straightforward to implement.

However, I'm still not sure a generic approach is necessary. A "simpler" approach is to say every track has a single role. But for some applications (like audio descriptions) there are two distinct role values defined - an additive one and an alternative one. The problem is addressed at a semantic level - i.e. people implement support for audio descriptions - and they know what these are and how to handle them - rather than trying for a generic descriptor matching algorithm.

Regarding Repetitive Stimulus Safe, I guess that since most content is unfortunately not labeled one way or the other the default assumption has to be up to the user themselves. i.e. that user preferences associated with this aspect should support required, preferred and don't care. In a really generic approach every role may have a status from { require, prefer, don't care, prefer not, require not }.

Again, this suggests that a generic approach might be over-ambitious - who says some new role doesn't come along next week with a sixth user-preference status of "required unless role Y present" or similar ... I think maybe the UA needs to understand what these things are and act appropriately.


On May 3, 2011, at 11:55 AM, David Singer wrote:

On May 2, 2011, at 16:55 , Mark Watson wrote:

I think it's evidence that there is something to be solved.

I'd prefer a solution where adding a track to an existing presentation didn't require me to change the properties of existing tracks, though, since there is an error waiting to happen in that case.

Yes.  This idea made some sense when it was the tracks in a multiplex (e.g. MP4 file), perhaps makes sense when all the tracks are annotated in the markup (e.g. in HTML5 or DASH MPD) but makes much less sense when some tracks are in a multiplex and some are added in the markup - a track added in the markup might need the annotations in a multiplex changed, ugh.

So, thinking out loud here.

Assume the user has a set of roles that they would kinda like to experience.  The default is 'main, supplementary', I think, or something like.

Now, we have a set of tracks, each of which satisfies some roles.  Let's ignore tracks we have discarded because they are the wrong mime type, codec, language, etc., and focus just on this selection mechanism.  What is the right simple way to get the set of tracks?

It's easy to 'go overboard' and treat this as a very general problem of finding the minimal set of tracks that will span a set of design roles.  I don't think anyone will author *for the same language*

track - main
track - captions
track - main +  captions

so an algorithm designed to pick only (3) instead of (1 + 2) for the main+captions desiring user is probably overkill.

'enable the tracks whose roles are a subset of the desired roles, and disable the rest' may be too simple, unless tracks are ordered from the most-labelled to the least-labelled.

So, audio-description replacing the main audio:
track - main description
track - main

Audio description adding to the main audio
track - main
track - description

The same works for all the adaptations that might require re-authoring or might be achievable with an additional track (captions, burned in or separate, for example).

Where this fails is when the 'base content' is good enough for both the plain user and the user who desires more roles.  The obvious case here (Mark will laugh) is repetitive-stimulus-safeness;  we have to assume unlabelled content is unsafe, but much content is naturally safe and can be labelled as such.

David Singer
Multimedia and Software Standards, Apple Inc.

Received on Friday, 6 May 2011 19:23:19 UTC