Re: Proposal for Audio and Video Track Selection and Synchronisation for Media Elements from Philip Jägenstedt on 2011-03-22 (public-html@w3.org from March 2011)

From: Philip Jägenstedt <philipj@opera.com>
Date: Tue, 22 Mar 2011 10:04:30 +0100
To: public-html@w3.org
Message-ID: <op.vsql5sxwsr6mfa@localhost.localdomain>
On Mon, 21 Mar 2011 23:08:54 +0100, Ian Hickson <ian@hixie.ch> wrote:

> On Mon, 21 Mar 2011, Philip Jägenstedt wrote:
>>
>> On Mon, 21 Mar 2011 12:00:46 +0100, Ian Hickson <ian@hixie.ch> wrote:
>> >
>> > * Allowing authors to use CSS for presentation control, e.g. to
>> >   control where multiple video channels are to be placed relative to
>> >   each other.
>>
>> It's clear how to do this declaratively, but with scripts there seems  
>> to be a
>> few steps involved:
>>
>> 1. Create/find a second video element and position it with CSS.
>>
>> 2. Binds the two video tracks together with the same MediaController  
>> using
>> .mediaGroup or creating a new controller.
>>
>> 3. Set the src of the second video element to the same as the first.
>>
>> 4. Wait until loadedmetadata and then enable the alternative video  
>> track.
>
> Well if you know ahead of time which track you want you can just use
> fragment identifiers in the 'src' attribute, but yes, worse case it would
> look something like this:
>
>   var controller = new MediaController();
>   var video1 = document.createElement('video');
>   video1.controller = controller;
>   video1.src = 'video.foo';
>   video1.onloadedmetadata = function () {
>     video1.videoTracks.select(1);
>   };
>   var video2 = document.createElement('video');
>   video2.controller = controller;
>   video2.src = 'video.foo';
>   video2.onloadedmetadata = function () {
>     video2.muted = true;
>     video2.videoTracks.select(2);
>   };
>
>
>> This seems like it would work, but what if steps 2 and 3 happen in the
>> other order?
>
> So long as they happen in the same task, it doesn't matter; the loading  
> of
> the media resource happens asynchronously after the task ends.

It's inevitable that someone will set video2.controller in  
video2.onloadedmetadata.

>> Is this a bug, or do we expect browsers to become clever enough to  
>> figure out
>> that the same decoding pipeline can be reused without any kind of hint?
>
> It doesn't seem that difficult to compare URLs... am I missing something?

That makes it easy to reuse the network connection and cache, Opera  
already does this. What I'm talking about here is using a single decoding  
pipeline when showing multiple video streams of same resource.

The model for playing multitrack resources in most media players is to  
have a single demuxer that exposes the available streams. If the user  
changes audio tracks or enables additional video tracks, the same demuxer  
is used, but new decoders are plugged to the streams that were previously  
not in use. This makes sync happen more or less automatically.

In your proposal, the decoding of two video streams from the same resource  
is not as tightly coupled, one can only make educated guesses. If one  
initially does share a single demuxer, it might become necessary to split  
that into two if the controller or playbackRate of one element changes.

I'm not saying that this is necessarily a bad thing, I just want to make  
sure that everyone is fully aware of the complexity. I think there are a  
few sane ways of going about this:

* Only allow setting the controller of an audio/video element while  
readyState == HAVE_NOTHING, so that the browser knows when starting to  
fetch a resource whether or not it's a good idea to share the decoding  
pipeline between two elements.

* Make the connection more explicit by letting each video track expose a  
Stream object which could be set to the .src of another video element. The  
other video element would not be able to change the playbackRate, so it  
would always be safe to reuse a single demuxer.

* Be explicit about what is expected of implementations and add to the  
best practices for implementors of media elements that they are expected  
to be able to merge and split decoding pipelines of separate media  
elements while they are playing in order to implement multitrack in a  
non-wasteful manner.

>> >    <video src="movie.vid#track=Video&amp;track=English" autoplay  
>> controls
>> > mediagroup=movie></video>
>> >    <video src="movie.vid#track=sign" autoplay  
>> mediagroup=movie></video>
>>
>> This is using Media Fragment URI:
>> http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-track
>>
>> It appears that the proposal assumes that track dimension can only be
>> specified one in a valid Media Fragment, but this is unfortunately not  
>> the
>> case. The MF spec states that "Multiple track specification is allowed,  
>> but
>> requires the specification of multiple track parameters."
>> Perhaps this is not a problem, #track=Alternative&track=Commentary  
>> would just
>> result in a resource with two tracks, and the first (Alternative) would  
>> be
>> used. However, how should this be reflected in audioTracks and  
>> videoTracks?
>
> This is handled in the HTML spec in the "Once enough of the media data  
> has
> been fetched to determine the duration of the media resource, its
> dimensions, and other metadata, and once the text tracks are ready" step
> of the resource fetch algorithm (substep 9, currently).

Ah, thanks.

>> > RISKS
>> > * It's possible that it is still too early for us to be adding any
>> >   kind of multi-track feature given the current implementation
>> >   priorities of user agents.
>>
>> Indeed, the complexity of implementing this is significant. It requires
>> a very capable media framework to do things like gapless looping of one
>> track synchronized with another and to determine when decoding pipelines
>> can be shared and not.
>
> If the chairs determine that there is consensus that we should only
> address a subset of the use cases today, I'd be happy to subset the
> proposal appropriately. The proposal is intended to show primarily where  
> I
> think we should be headed, it's often the case that we get to where we're
> headed in small incremental steps. (There's a lot of stuff in the spec
> that's commented out but already mostly specced out on these grounds. For
> example, the drag-and-drop feature has a lot more features specced but
> commented out than are currently obvious from reading the spec. Even the
> proposal we're talking about here has a section commented out talking
> about automatic ducking for audio description tracks.)

OK, for the record I question the necessity of:

* Synchronized playback of tracks of the same resource at different  
playbackRates and offsets.

* Synchronized playback of looping resources.

-- 
Philip Jägenstedt
Core Developer
Opera Software
Received on Tuesday, 22 March 2011 09:05:09 UTC