Re: [MSE] Questions about setting track language & kind (Bug 17006)

Hi Aaron,

I think that the DASH MPD can provide information instead of it being in the initialization segment. Maybe I'm wrong about this. If not, then in MSE we need to figure out how the MPD, and MPD updates, can provide  the UA information normally derived from the initialization segment.

Thanks,
Bob

From: Aaron Colwell <acolwell@google.com<mailto:acolwell@google.com>>
Date: Monday, September 24, 2012 12:10 PM
To: Mark Watson <watsonm@netflix.com<mailto:watsonm@netflix.com>>
Cc: "<public-html-media@w3.org<mailto:public-html-media@w3.org>>" <public-html-media@w3.org<mailto:public-html-media@w3.org>>
Subject: Re: [MSE] Questions about setting track language & kind (Bug 17006)
Resent-From: <public-html-media@w3.org<mailto:public-html-media@w3.org>>
Resent-Date: Monday, September 24, 2012 12:11 PM

Hi Mark,

Thanks for the response. I think I'm seeing a common theme in Bob & your response. It seems like you want track type to be expanded to include language and possibly role.

This seem like it could be tricky though since these fields are not required in the initialization segments. If the language is present in some init segments, but not in others, what is the UA supposed to do? If the web application explictly specifies the language then the UA can completely ignore what is in the init segments, but in the implicit case it isn't clear. The UA could just pick the values specified in the first init segment and just make sure the following init segments have the same info. This is basically what happens with the codec type. If the first init segment doesn't have language info then that track isn't considered to be a specific language and any language info that appears in later init segments are ignored and won't cause an error.

Does this sound reasonable?

Aaron


On Mon, Sep 24, 2012 at 10:15 AM, Mark Watson <watsonm@netflix.com<mailto:watsonm@netflix.com>> wrote:
Oh, I do so love multiplexed representations ;-)

See answers below.

On Sep 21, 2012, at 11:09 AM, Aaron Colwell wrote:

Hi,

On one of the calls several weeks ago I said I'd start a thread about several questions I had about Bug 17006<https://www.w3.org/Bugs/Public/show_bug.cgi?id=17006>. Here it is. :) The goal of this bug is to provide a way to reflect the role & language specified in a DASH manifest in the {Audio | Video | Text}Track objects.

I've spent some time trying to understand the DASH spec and have come up with these questions:

1. Do people still want this feature? I believe it was one of the open issues our friends at Microsoft asked to be included in the original proposal.

Yes. Whether the information comes from a DASH manifest, or elsewhere, it may not be in the media itself and it would be good to expose it on the video-element in a source-independent way.


2. Why would it be better to put this information in the manifest instead of the initialization segments?

A good question, but ...

Don't they have role & language information encoded in them?

… no. An audio file is just an audio file and is not necessarily annotated in the file with content metadata such as language and purpose (commentary etc.). I guess DASH could have required those annotations to be in band in the file, but it doesn't.


3. It looks like language & role can be specified at the AdaptationSet & ContentComponent level. How should these be treated differently in the Media Source context?

For the unmultiplexed case, all the Representations in an AdaptationSet have the same languages and kind, because the meaning of an AdaptationSet is that they can be automatically switched for bitrate adaptation. So for the unmixed case the language and kind on the AdaptationSet apply to all the Representations.

For the multiplexed case, similarly, every Representation must contain the same set of multiplexed tracks, because the player could switch between Representations. The ContentComponent elements describe the tracks in the multiplex, including their language and kind.

If you have language and kind on the AdaptationSet level and there are also ContentComponents, then the AdaptationSet level annotations (if they are allowed at all, I am not sure), serve only as defaults for any ContentComponent which doesn't explicitly include them.


4. In the context of this bug, are we assuming a 1:1 mapping between AdaptationSets and SourceBuffers? (ie Representations from different AdaptationSets won't be mixed)

Representations from the same AdaptationSet should certainly be fed into the same SourceBuffer.

I think things are clearer and simpler if different AdaptationSets map to different SourceBuffers, so we might want to require or assume that.

As things stand now, it's possible to do language switching by appending the new language audio into the same SourceBuffer as the previous one. I'm not sure why you would want to do that instead of creating a new SourceBuffer. Supporting language and kind mapping in this model would require mapping a single SourceBuffer to multiple HTML <video> tracks and I don't see an easy way to do this without in-band indications.

I think in that model, the application is taking more control of what is appended where (it has to know, for example, which parts of the SourceBuffer are in which language so that it can properly handle seeking). It would be reasonable to not support language/kind mapping in that case.


5. Are contentComponent id's specified in SubRepresentations  required to have the same language and role across all Representations in an AdaptationSet? If not, I believe this could mean the language for tracks could change if the application switches representations in an adaptation set.

I'm struggling a little to remember exactly how SubRepresentations work with multiplexed content, but I think what you say makes sense: the SubRepresentations must have consistent content across the Representations of an AdaptationSet otherwise automatic switching would not be appropriate.


6. There don't appear to be trackIDs mentioned in the manifest. Is it safe to assume that role & language apply to all tracks within the representation? If so, how are alternate language tracks represented within an AdaptationSet?

If the Representations contain multiple multiplexed tracks, then the language and kind of the tracks are given by the ContentComponent elements. Every Representation must contain the same tracks. I am not exactly sure how you detect which track is which, though ...


7. What is the expected behavior if the language of a track suddenly changes? Say I have 2 audio tracks. Track 1 is English and track 2 is French. My preferred language is English so track 1 is selected. I then append a new initialization segment that indicates track 1 has French and track 2 is English along with a few media segments.
  a. Should the UA switch to track 2 at the appropriate time so that English audio is always output?
  b. Should this kind of language change be rejected?

I don't think you should change language in the middle of a track.



Aaron

Received on Monday, 24 September 2012 19:46:47 UTC