Re: [MSE] Questions about setting track language & kind (Bug 17006) from Mark Watson on 2012-09-24 (public-html-media@w3.org from September 2012)

From: Mark Watson <watsonm@netflix.com>
Date: Mon, 24 Sep 2012 17:15:04 +0000
To: Aaron Colwell <acolwell@google.com>
CC: "<public-html-media@w3.org>" <public-html-media@w3.org>
Message-ID: <8132BBEA-C5BC-4151-8E06-D7C4E903D3B1@netflix.com>

Oh, I do so love multiplexed representations ;-)

See answers below.

On Sep 21, 2012, at 11:09 AM, Aaron Colwell wrote:

Hi,

On one of the calls several weeks ago I said I'd start a thread about several questions I had about Bug 17006<https://www.w3.org/Bugs/Public/show_bug.cgi?id=17006>. Here it is. :) The goal of this bug is to provide a way to reflect the role & language specified in a DASH manifest in the {Audio | Video | Text}Track objects.

I've spent some time trying to understand the DASH spec and have come up with these questions:

1. Do people still want this feature? I believe it was one of the open issues our friends at Microsoft asked to be included in the original proposal.

Yes. Whether the information comes from a DASH manifest, or elsewhere, it may not be in the media itself and it would be good to expose it on the video-element in a source-independent way.

2. Why would it be better to put this information in the manifest instead of the initialization segments?

A good question, but ...

Don't they have role & language information encoded in them?

… no. An audio file is just an audio file and is not necessarily annotated in the file with content metadata such as language and purpose (commentary etc.). I guess DASH could have required those annotations to be in band in the file, but it doesn't.

3. It looks like language & role can be specified at the AdaptationSet & ContentComponent level. How should these be treated differently in the Media Source context?

For the unmultiplexed case, all the Representations in an AdaptationSet have the same languages and kind, because the meaning of an AdaptationSet is that they can be automatically switched for bitrate adaptation. So for the unmixed case the language and kind on the AdaptationSet apply to all the Representations.

For the multiplexed case, similarly, every Representation must contain the same set of multiplexed tracks, because the player could switch between Representations. The ContentComponent elements describe the tracks in the multiplex, including their language and kind.

If you have language and kind on the AdaptationSet level and there are also ContentComponents, then the AdaptationSet level annotations (if they are allowed at all, I am not sure), serve only as defaults for any ContentComponent which doesn't explicitly include them.

4. In the context of this bug, are we assuming a 1:1 mapping between AdaptationSets and SourceBuffers? (ie Representations from different AdaptationSets won't be mixed)

Representations from the same AdaptationSet should certainly be fed into the same SourceBuffer.

I think things are clearer and simpler if different AdaptationSets map to different SourceBuffers, so we might want to require or assume that.

As things stand now, it's possible to do language switching by appending the new language audio into the same SourceBuffer as the previous one. I'm not sure why you would want to do that instead of creating a new SourceBuffer. Supporting language and kind mapping in this model would require mapping a single SourceBuffer to multiple HTML <video> tracks and I don't see an easy way to do this without in-band indications.

I think in that model, the application is taking more control of what is appended where (it has to know, for example, which parts of the SourceBuffer are in which language so that it can properly handle seeking). It would be reasonable to not support language/kind mapping in that case.

5. Are contentComponent id's specified in SubRepresentations required to have the same language and role across all Representations in an AdaptationSet? If not, I believe this could mean the language for tracks could change if the application switches representations in an adaptation set.

I'm struggling a little to remember exactly how SubRepresentations work with multiplexed content, but I think what you say makes sense: the SubRepresentations must have consistent content across the Representations of an AdaptationSet otherwise automatic switching would not be appropriate.

6. There don't appear to be trackIDs mentioned in the manifest. Is it safe to assume that role & language apply to all tracks within the representation? If so, how are alternate language tracks represented within an AdaptationSet?

If the Representations contain multiple multiplexed tracks, then the language and kind of the tracks are given by the ContentComponent elements. Every Representation must contain the same tracks. I am not exactly sure how you detect which track is which, though ...

7. What is the expected behavior if the language of a track suddenly changes? Say I have 2 audio tracks. Track 1 is English and track 2 is French. My preferred language is English so track 1 is selected. I then append a new initialization segment that indicates track 1 has French and track 2 is English along with a few media segments.
a. Should the UA switch to track 2 at the appropriate time so that English audio is always output?
b. Should this kind of language change be rejected?

I don't think you should change language in the middle of a track.

Aaron

Received on Monday, 24 September 2012 17:15:33 UTC