Re: [MSE] Questions about setting track language & kind (Bug 17006) from Aaron Colwell on 2012-09-24 (public-html-media@w3.org from September 2012)

From: Aaron Colwell <acolwell@google.com>
Date: Mon, 24 Sep 2012 11:10:51 -0700
To: Mark Watson <watsonm@netflix.com>
Cc: "<public-html-media@w3.org>" <public-html-media@w3.org>
Message-ID: <CAA0c1bBuPQcoja34_9Pya=Y21G-1X=uSF58-nPL4ZwA41oJ2pw@mail.gmail.com>
Hi Mark,

Thanks for the response. I think I'm seeing a common theme in Bob & your
response. It seems like you want track type to be expanded to include
language and possibly role.

This seem like it could be tricky though since these fields are not
required in the initialization segments. If the language is present in some
init segments, but not in others, what is the UA supposed to do? If the web
application explictly specifies the language then the UA can completely
ignore what is in the init segments, but in the implicit case it isn't
clear. The UA could just pick the values specified in the first init
segment and just make sure the following init segments have the same info.
This is basically what happens with the codec type. If the first init
segment doesn't have language info then that track isn't considered to be a
specific language and any language info that appears in later init segments
are ignored and won't cause an error.

Does this sound reasonable?

Aaron


On Mon, Sep 24, 2012 at 10:15 AM, Mark Watson <watsonm@netflix.com> wrote:

>  Oh, I do so love multiplexed representations ;-)
>
>  See answers below.
>
>   On Sep 21, 2012, at 11:09 AM, Aaron Colwell wrote:
>
> Hi,
>
>  On one of the calls several weeks ago I said I'd start a thread about
> several questions I had about Bug 17006<https://www.w3.org/Bugs/Public/show_bug.cgi?id=17006>.
> Here it is. :) The goal of this bug is to provide a way to reflect the role
> & language specified in a DASH manifest in the {Audio | Video | Text}Track
> objects.
>
>  I've spent some time trying to understand the DASH spec and have come up
> with these questions:
>
>  1. Do people still want this feature? I believe it was one of the open
> issues our friends at Microsoft asked to be included in the original
> proposal.
>
>
>  Yes. Whether the information comes from a DASH manifest, or elsewhere,
> it may not be in the media itself and it would be good to expose it on the
> video-element in a source-independent way.
>
>
>  2. Why would it be better to put this information in the manifest
> instead of the initialization segments?
>
>
>  A good question, but ...
>
>  Don't they have role & language information encoded in them?
>
>
>  … no. An audio file is just an audio file and is not necessarily
> annotated in the file with content metadata such as language and purpose
> (commentary etc.). I guess DASH could have required those annotations to be
> in band in the file, but it doesn't.
>
>
>  3. It looks like language & role can be specified at the AdaptationSet &
> ContentComponent level. How should these be treated differently in the
> Media Source context?
>
>
>  For the unmultiplexed case, all the Representations in an AdaptationSet
> have the same languages and kind, because the meaning of an AdaptationSet
> is that they can be automatically switched for bitrate adaptation. So for
> the unmixed case the language and kind on the AdaptationSet apply to all
> the Representations.
>
>  For the multiplexed case, similarly, every Representation must contain
> the same set of multiplexed tracks, because the player could switch between
> Representations. The ContentComponent elements describe the tracks in the
> multiplex, including their language and kind.
>
>  If you have language and kind on the AdaptationSet level and there are
> also ContentComponents, then the AdaptationSet level annotations (if they
> are allowed at all, I am not sure), serve only as defaults for any
> ContentComponent which doesn't explicitly include them.
>
>
>  4. In the context of this bug, are we assuming a 1:1 mapping between
> AdaptationSets and SourceBuffers? (ie Representations from different
> AdaptationSets won't be mixed)
>
>
>  Representations from the same AdaptationSet should certainly be fed into
> the same SourceBuffer.
>
>  I think things are clearer and simpler if different AdaptationSets map
> to different SourceBuffers, so we might want to require or assume that.
>
>  As things stand now, it's possible to do language switching by appending
> the new language audio into the same SourceBuffer as the previous one. I'm
> not sure why you would want to do that instead of creating a new
> SourceBuffer. Supporting language and kind mapping in this model would
> require mapping a single SourceBuffer to multiple HTML <video> tracks and I
> don't see an easy way to do this without in-band indications.
>
>  I think in that model, the application is taking more control of what is
> appended where (it has to know, for example, which parts of the
> SourceBuffer are in which language so that it can properly handle seeking).
> It would be reasonable to not support language/kind mapping in that case.
>
>
>  5. Are contentComponent id's specified in SubRepresentations  required
> to have the same language and role across all Representations in an
> AdaptationSet? If not, I believe this could mean the language for tracks
> could change if the application switches representations in an adaptation
> set.
>
>
>  I'm struggling a little to remember exactly how SubRepresentations work
> with multiplexed content, but I think what you say makes sense: the
> SubRepresentations must have consistent content across the Representations
> of an AdaptationSet otherwise automatic switching would not be appropriate.
>
>
>  6. There don't appear to be trackIDs mentioned in the manifest. Is it
> safe to assume that role & language apply to all tracks within the
> representation? If so, how are alternate language tracks represented within
> an AdaptationSet?
>
>
>  If the Representations contain multiple multiplexed tracks, then the
> language and kind of the tracks are given by the ContentComponent elements.
> Every Representation must contain the same tracks. I am not exactly sure
> how you detect which track is which, though ...
>
>
>  7. What is the expected behavior if the language of a track suddenly
> changes? Say I have 2 audio tracks. Track 1 is English and track 2 is
> French. My preferred language is English so track 1 is selected. I then
> append a new initialization segment that indicates track 1 has French and
> track 2 is English along with a few media segments.
>   a. Should the UA switch to track 2 at the appropriate time so that
> English audio is always output?
>   b. Should this kind of language change be rejected?
>
>
>  I don't think you should change language in the middle of a track.
>
>
>
>  Aaron
>
>
>
Received on Monday, 24 September 2012 18:11:20 UTC