Re: [MSE] Questions about setting track language & kind (Bug 17006) from Bob Lund on 2012-09-24 (public-html-media@w3.org from September 2012)

From: Bob Lund <B.Lund@CableLabs.com>
Date: Mon, 24 Sep 2012 13:34:30 -0600
To: Aaron Colwell <acolwell@google.com>
CC: "<public-html-media@w3.org>" <public-html-media@w3.org>
Message-ID: <CC860BD2.20DBE%b.lund@cablelabs.com>
Hi Aaron,

Answers to your questions inline.

Bob

From: Aaron Colwell <acolwell@google.com<mailto:acolwell@google.com>>
Date: Monday, September 24, 2012 11:41 AM
To: Bob Lund <b.lund@cablelabs.com<mailto:b.lund@cablelabs.com>>
Cc: "<public-html-media@w3.org<mailto:public-html-media@w3.org>>" <public-html-media@w3.org<mailto:public-html-media@w3.org>>
Subject: Re: [MSE] Questions about setting track language & kind (Bug 17006)

Hi Bob,

Thanks for your responses. Comments inline...


On Fri, Sep 21, 2012 at 2:36 PM, Bob Lund <B.Lund@cablelabs.com<mailto:B.Lund@cablelabs.com>> wrote:


From: Aaron Colwell <acolwell@google.com<mailto:acolwell@google.com>>
Date: Friday, September 21, 2012 12:09 PM
To: "<public-html-media@w3.org<mailto:public-html-media@w3.org>>" <public-html-media@w3.org<mailto:public-html-media@w3.org>>
Subject: [MSE] Questions about setting track language & kind (Bug 17006)
Resent-From: <public-html-media@w3.org<mailto:public-html-media@w3.org>>
Resent-Date: Friday, September 21, 2012 12:10 PM

Hi,

On one of the calls several weeks ago I said I'd start a thread about several questions I had about Bug 17006<https://www.w3.org/Bugs/Public/show_bug.cgi?id=17006>. Here it is. :) The goal of this bug is to provide a way to reflect the role & language specified in a DASH manifest in the {Audio | Video | Text}Track objects.

I've spent some time trying to understand the DASH spec and have come up with these questions:

1. Do people still want this feature? I believe it was one of the open issues our friends at Microsoft asked to be included in the original proposal.

There are specifications that define the use DASH MPD descriptors for conveying video, audio and text track meta data used for setting the equivalent attributes in HTML5 objects. I think this feature is required.

[acolwell] Ok.



2. Why would it be better to put this information in the manifest instead of the initialization segments? Don't they have role & language information encoded in them?

Couple of reasons. Putting the information in the manifest allows for a common representation that can be independent of which DASH profile (ISO BMFF, MPEG-2 TS or some future profile (WebM?)) is used. Also, as cited earlier, there are specifications that define use of the manifest for this information.


[acolwell] Ok.  So the intent is for the information in the manifest to always override whatever information is in the media bytestream. In other words, if this info is set by external means, nothing appended to a SourceBuffer should change it.

[Bob] I don't know what DASH says about duplicated or inconsistent manifest and in-band track metadata. Maybe Mark Watson knows the answer to this.


3. It looks like language & role can be specified at the AdaptationSet & ContentComponent level. How should these be treated differently in the Media Source context?

Not sure what the question is but use of these attributes in the ContentComponent override their use in the AdaptationSet.


[acolwell] Ok.

4. In the context of this bug, are we assuming a 1:1 mapping between AdaptationSets and SourceBuffers?

I think that is the implications of MSE spec section 2.4 bullet 1.

[acolwell] I guess I didn't really think a change of language or role was considered a different type of track? Should it be? If so, I think this should be more explicit in the spec text. Being able to splice content with different languages together in a single SourceBuffer seems like it should be ok, but I'm not sure if allowing role changes makes sense.

(ie Representations from different AdaptationSets won't be mixed)

5. Are contentComponent id's specified in SubRepresentations  required to have the same language and role across all Representations in an AdaptationSet?

This is an AdaptationSet requirement.

[acolwell] ok.


If not, I believe this could mean the language for tracks could change if the application switches representations in an adaptation set.

6. There don't appear to be trackIDs mentioned in the manifest. Is it safe to assume that role & language apply to all tracks within the representation?
If so, how are alternate language tracks represented within an AdaptationSet?

An AdaptationSet (or ContentComponent in an AdaptionSet) is equivalent to a audio, video or text track, so role and language apply at that level. Alternate language tracks are different AdaptationSets in a Period. Section G.1 in the Dash spec shows an example of this.


[acolwell] I thought AdaptationSets could contain multiplexed content as well. It seems like ContentComponents exist to handle this, but I don't see any information that maps ContentComponents to the trackIDs in the media. It looks like the id attribute could be used for this, but the description just says it needs to be a unique ID and does not say that it maps to the internal trackIDs of the media. Am I overlooking something here?

[Bob] Yes, in this case the AdaptionSet definition will contain ContentComponents, one for each track in the multiplex. And yes, the DASH spec is silent on how the ID can be used to correlate the ContentComponent with the a specific multiplexed track. The DASH spec requires this ID to be unique WRT the AdaptationSet. So in the case of MPEG-2 TS multiplexes, the ID could be the PID of the MPEG-2 TS elementary stream represented by the ID. If memory serves, Ogg, ISO BMFF and WebM all have a similar unique track descriptor that could be used as the ContentComponent ID. This would need to specified somewhere.


7. What is the expected behavior if the language of a track suddenly changes?

I think this would equate to a new Period (with different AdaptationSets). The MSE spec section 2.4 bullet 1 states that the number and type of tracks needs to constant in a source buffer. So wouldn't this scenario result in a new source buffer?

[acolwell] So a change in language should be considered a different type of track?

[Bob] I'm not advocating that; I am only stating the implication of the MSE and DASH specs as they exist today. It would seem cleaner to consider your scenario as a new track. Does this cause a problem?


Say I have 2 audio tracks. Track 1 is English and track 2 is French. My preferred language is English so track 1 is selected. I then append a new initialization segment that indicates track 1 has French and track 2 is English along with a few media segments.

Doesn't this violate the MSE 2.4 bullet 1?

[acolwell] When I originally wrote this I was only considering the track type (ie audio, video, or text). In that sense it doesn't violate bullet 1 because there are 2 audio tracks. I think what I'm hearing though is that this should be broadened to "english audio" & "french audio". In that case then I agree that this should be rejected because Track 1 changes from "english audio" to "french audio".

Aaron
Received on Monday, 24 September 2012 19:35:00 UTC