Re: Tech Discussions on the Multitrack Media (issue-152)

On Fri, 18 Feb 2011 18:44:14 +0100, Mark Watson <>  

> On Feb 18, 2011, at 8:31 AM, Philip Jägenstedt wrote:
>> On Fri, 18 Feb 2011 17:08:28 +0100, Mark Watson <>
>> wrote:
>>> On Feb 18, 2011, at 2:08 AM, Philip Jägenstedt wrote:
>>>> On Thu, 17 Feb 2011 18:43:49 +0100, Mark Watson <>
>>>> wrote:
>>>>> On Feb 17, 2011, at 7:17 AM, Philip Jägenstedt wrote:
>>>>>> On Wed, 16 Feb 2011 18:47:22 +0100, Mark Watson  
>>>>>> <>
>>>>>> wrote:
>>>>>>> On Feb 16, 2011, at 12:02 AM, Philip Jägenstedt wrote:
>>>>>>>> On Wed, 16 Feb 2011 03:31:47 +0100, Silvia Pfeiffer
>>>>>>>> <> wrote:
>>>>>>>>> On Wed, Feb 16, 2011 at 12:08 PM, Jonas Sicking  
>>>>>>>>> <>
>>>>>>>>> wrote:
>>>>>>>>>> On Tue, Feb 15, 2011 at 4:19 PM, Silvia Pfeiffer
>>>>>>>>>> <> wrote:
>>>>>>>>>>> On Wed, Feb 16, 2011 at 5:36 AM, Mark Watson
>>>>>>>>>>> <>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Hi Philip,
>>>>>>>>>>>> Just a quick note that the "alternative" vs "additional"
>>>>>>>>>>>> distinction
>>>>>>>>>>>> is not always completely clear. Video with different camera
>>>>>>>>>>>> angles
>>>>>>>>>>>> (gimmiky or not) could be considered as an alternative, or  
>>>>>>>>>>>> could
>>>>>>>>>>>> be
>>>>>>>>>>>> rendered as picture-in-picture, or multiple thumbnail videos
>>>>>>>>>>>> could
>>>>>>>>>>>> show beside the main video (some sports sites already do this
>>>>>>>>>>>> kind
>>>>>>>>>>>> of
>>>>>>>>>>>> thing).
>>>>>>>> Sure, but all of those modes should be achieved by the author  
>>>>>>>> making
>>>>>>>> it
>>>>>>>> happen with CSS. At the risk of making a strawman argument, I
>>>>>>>> honestly
>>>>>>>> can't see browsers allowing the user to change the rendering of  
>>>>>>>> the
>>>>>>>> page
>>>>>>>> to achieve PiP or something like that when the author hasn't
>>>>>>>> provided
>>>>>>>> for
>>>>>>>> it, messing with the layout like that seems both weird and  
>>>>>>>> unlikely
>>>>>>>> to
>>>>>>>> be
>>>>>>>> useful. Of course we can have User JavaScript and User CSS to do
>>>>>>>> that
>>>>>>>> kind
>>>>>>>> of thing, though.
>>>>>>> I was assuming that the "author" of the content - who labels the
>>>>>>> tracks
>>>>>>> - might not be the same as the "author" of the webpage that is
>>>>>>> rendering
>>>>>>> the content. So the first author should not assume that (say)
>>>>>>> multiple
>>>>>>> views are alternatives, because some webpages might be able to view
>>>>>>> them
>>>>>>> both as PIP.
>>>>>> Since the tracks are labeled using the attribute of the <track>
>>>>>> attribute,
>>>>>> it will be the page author that has to do the work to support some
>>>>>> specific video display, be that PiP, overlay or something else.
>>>>> That would be the case for track objects created as a result of  
>>>>> <track>
>>>>> elements, but what about in-band tracks ? The page author does the  
>>>>> work
>>>>> for PIP etc., of course, but the media author should not assume that
>>>>> such capabilities are or are not available on the pages where their
>>>>> media might be used: they should just label the tracks and let the  
>>>>> page
>>>>> to whatever it is capable of.
>>>> I don't think we should spend much time making extra in-band video
>>>> tracks
>>>> work more than barely, if at all, since the extra bandwidth needed to
>>>> have
>>>> multiple in-band video tracks makes it quite unlikely the feature  
>>>> would
>>>> be
>>>> used to any greater extent.
>>> A track declared within an adaptive streaming manifest (e.g. a DASH
>>> manifest or take-your-pick of various proprietary adaptive streaming
>>> solutions) would be an in-band track but would only be fetched when
>>> actually needed.
>> Good point.
>>>> If they should work at all, my position is that the only thing you
>>>> should
>>>> be able to do with in-band video tracks is switch between them, in  
>>>> other
>>>> words what I've called alternative tracks. Either having some kind of
>>>> layout information in the file itself or having HTML markup to target
>>>> individual tracks of the same resource seems like unjustified  
>>>> complexity
>>>> and spec/implementation effort not very well spent.
>>> I think people do imagine that adaptive streaming manifests would
>>> declare all the tracks needed for a presentation - including sign
>>> language tracks that are additional rather than alternative. Such
>>> manifests have to be useful in environments other than HTML and so need
>>> to included everything. I don't think we should ask people to re-author
>>> them in HTML for use in HTML environments.
>> I quite disagree, designing something to work both in browsers and
>> non-browsers means that we can't make good use of whatever existing
>> capabilities browsers already have. In this case, I think we should rely
>> on CSS and only CSS to achieve the desired rendering of multitrack  
>> video.
>> Any default rendering we could provide is unlikely to fit well enough in
>> with the overall design of the page that people will want to use it.
> I don't disagree with using CSS to achieve the desired rendering in a  
> HTML environment, but I don't understand why that conflicts with what I  
> wrote above ?

In my recent reply to Silvia's mail  
<> I've  
outlined why I rendering of multitrack video should be controlled by CSS  
and only CSS. If I've misunderstood how you want rendering of multitrack  
video to work (we haven't really been clear about what we want so far),  
please do outline how you would like things to work.

>> Some samples Silvia collected in
>> <>
>> demonstrate quite clearly IMO the variety of styles we can expect to  
>> see.
>>> I guess what I am saying is that Option (1) in the wiki write-up should
>>> be supported in order to provide support for adaptive streaming. The
>>> questions are:
>>> (1) whether this should be the only way to declare such
>>> additional/alternative tracks or whether an HTML markup way is also
>>> required (and I think that it is)
>> I don't see how this approach could give us the flexibility in styling
>> that is necessary. How do you envisage getting a visual end result  
>> similar
>> to <> (note the  
>> fancy
>> borders around the overlayed video) using a manifest approach?
> Again, not sure I understand the problem.

See the same mail referenced above. In short, I doubt that any default  
rendering of video tracks provided by the UA without the cooperation of  
the page author is going to be good enough.

To be clear, I do think that there's plenty of room for browser extensions  
(User JavaScrit/CSS) to tweak the rendering of <video>, multitrack or not.  
I just don't think that it's something that needs to be built into  
browsers by default.

> It seems important that with adaptive streaming all the tracks are  
> declared together in the manifest and so the transport of those can be  
> handled by the player in an HTML-independent way. The same adaptive  
> streaming player software components can be reused across different  
> environments. The transport of the different tracks interacts, so they  
> should not be handled independently.
> If I understand correctly, you are saying that in an HTML environment  
> you would like the full power of HTML+CSS to be available to control the  
> presentation of the tracks - I completely agree (otherwise why are you  
> using an HTML environment at all?). So some means of applying CSS  
> styling to in-band tracks is required. That seems to be the case both  
> for in-band tracks in a single file and those described in a manifest.
> That means a single manifest could be used in both HTML and non-HTML  
> environments and in the HTML environment you would have the flexibility  
> to style the tracks with CSS if you chose to.
> I'd like to avoid having to re-invent all the transport-related aspects  
> of adaptive streaming manifests at the HTML level.

I admit to not being very familiar with the various streaming solutions  
out there. Could you perhaps elaborate a bit on this: If two video tracks  
are served via two different URLs, what kind of penalty would this incurr  
at the transport level except potentially fetching two manifests rather  
than one? With HTTP transport I can't see it being the case, are you  
talking about RTSP or some other fancier protocols that allow intelligent  
interleaving of multiple tracks? In such a case, why wouldn't the browser  
have enough information to share a single channel based on the fact that  
the two tracks are set to be in sync?

How do existing adaptive streaming systems work? Do they just present a  
bunch of available video tracks and let the UA render them however they  
want? It seems to me like the UIs of streaming solutions like TV stations  
are specifically tailored to the content they serve, not really generic  
UIs that would be able to handle any combination of additional video  

>>> (2) what should that markup be
>>> (3) how to define the API for discovering and manipulating these tracks
>>> in a way that is common for in-band (from a file or from an adaptive
>>> streaming manifest) and explicitly marked up tracks.
>> From a markup and API perspective a multitrack manifest file would be
>> treated the same as a multitrack WebM file, right?
>> From an API perspective, certainly.
> I don't know what has been discussed for markup of multi-track WebM  
> files: do you expect the multiple tracks to be exposed in HTML with  
> explicit <track> elements, or do they just appear in the API when the  
> file is loaded ? If the latter, then yes, I would see a multitrack  
> manifest file as no different from a multitrack media file (except  
> perhaps that the manifest could possibly be loaded earlier).

All discussion that I'm aware of this thread is this thread and the recent  
"Displaying Multitrack Video (issue-152)"  
Apart from the solutions listed in  
<>, I can see  
these for in-band tracks, be that multitrack WebM or multitrack streaming:

1. Only allow switching between the tracks, like for multiple angles.

2. Have multiple <video> elements pointing to the same resource, but  
reference specific tracks using Media Fragment URI syntax, like <video  
src="video.webm#track=video2"> or similar.

In either case there would be a DOM API exposing the track information to  
allow scripts to do the same things as one can do declaratively or via  
native browser controls.

Philip Jägenstedt
Core Developer
Opera Software

Received on Saturday, 19 February 2011 12:30:13 UTC