W3C home > Mailing lists > Public > public-html@w3.org > February 2011

Re: Tech Discussions on the Multitrack Media (issue-152)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Tue, 22 Feb 2011 01:13:02 +1100
Message-ID: <AANLkTi=G4MxdE0BW6DCUiWgt1iepK=UEjXi+nXT1ORbw@mail.gmail.com>
To: Philip Jägenstedt <philipj@opera.com>
Cc: public-html@w3.org
On Mon, Feb 21, 2011 at 9:42 PM, Philip Jägenstedt <philipj@opera.com> wrote:
> On Sun, 20 Feb 2011 00:42:51 +0100, Silvia Pfeiffer
> <silviapfeiffer1@gmail.com> wrote:
>> On Sat, Feb 19, 2011 at 9:09 PM, Philip Jägenstedt <philipj@opera.com>
>> wrote:
>>> On Fri, 18 Feb 2011 22:41:53 +0100, Silvia Pfeiffer
>>> <silviapfeiffer1@gmail.com> wrote:
>>>> On 19/02/2011, at 3:31 AM, Philip Jägenstedt <philipj@opera.com> wrote:
>>>>> On Fri, 18 Feb 2011 17:08:28 +0100, Mark Watson <watsonm@netflix.com>
>>>>> wrote:
>>>>>> On Feb 18, 2011, at 2:08 AM, Philip Jägenstedt wrote:
>>>>>>> On Thu, 17 Feb 2011 18:43:49 +0100, Mark Watson <watsonm@netflix.com>
>>>>>>> wrote:
>>>>>>>> On Feb 17, 2011, at 7:17 AM, Philip Jägenstedt wrote:
>>>>>>>>> On Wed, 16 Feb 2011 18:47:22 +0100, Mark Watson
>>>>>>>>> <watsonm@netflix.com>
>>>>>>>>> wrote:
>>>>>>>>>> On Feb 16, 2011, at 12:02 AM, Philip Jägenstedt wrote:
>>>>>>>>>>> On Wed, 16 Feb 2011 03:31:47 +0100, Silvia Pfeiffer
>>>>>>>>>>> <silviapfeiffer1@gmail.com> wrote:
>>>>>>>>>>>> On Wed, Feb 16, 2011 at 12:08 PM, Jonas Sicking
>>>>>>>>>>>> <jonas@sicking.cc>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> On Tue, Feb 15, 2011 at 4:19 PM, Silvia Pfeiffer
>>>>>>>>>>>>> <silviapfeiffer1@gmail.com> wrote:
>>>>>>>>>>>>>> On Wed, Feb 16, 2011 at 5:36 AM, Mark Watson
>>>>>>>>>>>>>> <watsonm@netflix.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Hi Philip,
>>>>>>>>>>>>>>> Just a quick note that the "alternative" vs "additional"
>>>>>>>>>>>>>>> distinction
>>>>>>>>>>>>>>> is not always completely clear. Video with different camera
>>>>>>>>>>>>>>> angles
>>>>>>>>>>>>>>> (gimmiky or not) could be considered as an alternative, or
>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>> rendered as picture-in-picture, or multiple thumbnail videos
>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>> show beside the main video (some sports sites already do this
>>>>>>>>>>>>>>> kind
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> thing).
>>>>>>>>>>> Sure, but all of those modes should be achieved by the author
>>>>>>>>>>> making
>>>>>>>>>>> it
>>>>>>>>>>> happen with CSS. At the risk of making a strawman argument, I
>>>>>>>>>>> honestly
>>>>>>>>>>> can't see browsers allowing the user to change the rendering of
>>>>>>>>>>> the
>>>>>>>>>>> page
>>>>>>>>>>> to achieve PiP or something like that when the author hasn't
>>>>>>>>>>> provided
>>>>>>>>>>> for
>>>>>>>>>>> it, messing with the layout like that seems both weird and
>>>>>>>>>>> unlikely
>>>>>>>>>>> to
>>>>>>>>>>> be
>>>>>>>>>>> useful. Of course we can have User JavaScript and User CSS to do
>>>>>>>>>>> that
>>>>>>>>>>> kind
>>>>>>>>>>> of thing, though.
>>>>>>>>>> I was assuming that the "author" of the content - who labels the
>>>>>>>>>> tracks
>>>>>>>>>> - might not be the same as the "author" of the webpage that is
>>>>>>>>>> rendering
>>>>>>>>>> the content. So the first author should not assume that (say)
>>>>>>>>>> multiple
>>>>>>>>>> views are alternatives, because some webpages might be able to
>>>>>>>>>> view
>>>>>>>>>> them
>>>>>>>>>> both as PIP.
>>>>>>>>> Since the tracks are labeled using the attribute of the <track>
>>>>>>>>> attribute,
>>>>>>>>> it will be the page author that has to do the work to support some
>>>>>>>>> specific video display, be that PiP, overlay or something else.
>>>>>>>> That would be the case for track objects created as a result of
>>>>>>>> <track>
>>>>>>>> elements, but what about in-band tracks ? The page author does the
>>>>>>>> work
>>>>>>>> for PIP etc., of course, but the media author should not assume that
>>>>>>>> such capabilities are or are not available on the pages where their
>>>>>>>> media might be used: they should just label the tracks and let the
>>>>>>>> page
>>>>>>>> to whatever it is capable of.
>>>>>>> I don't think we should spend much time making extra in-band video
>>>>>>> tracks
>>>>>>> work more than barely, if at all, since the extra bandwidth needed to
>>>>>>> have
>>>>>>> multiple in-band video tracks makes it quite unlikely the feature
>>>>>>> would
>>>>>>> be
>>>>>>> used to any greater extent.
>>>>>> A track declared within an adaptive streaming manifest (e.g. a DASH
>>>>>> manifest or take-your-pick of various proprietary adaptive streaming
>>>>>> solutions) would be an in-band track but would only be fetched when
>>>>>> actually
>>>>>> needed.
>>>>> Good point.
>>>>>>> If they should work at all, my position is that the only thing you
>>>>>>> should
>>>>>>> be able to do with in-band video tracks is switch between them, in
>>>>>>> other
>>>>>>> words what I've called alternative tracks. Either having some kind of
>>>>>>> layout information in the file itself or having HTML markup to target
>>>>>>> individual tracks of the same resource seems like unjustified
>>>>>>> complexity
>>>>>>> and spec/implementation effort not very well spent.
>>>>>> I think people do imagine that adaptive streaming manifests would
>>>>>> declare all the tracks needed for a presentation - including sign
>>>>>> language
>>>>>> tracks that are additional rather than alternative. Such manifests
>>>>>> have to
>>>>>> be useful in environments other than HTML and so need to included
>>>>>> everything. I don't think we should ask people to re-author them in
>>>>>> HTML for
>>>>>> use in HTML environments.
>>>>> I quite disagree, designing something to work both in browsers and
>>>>> non-browsers means that we can't make good use of whatever existing
>>>>> capabilities browsers already have. In this case, I think we should
>>>>> rely on
>>>>> CSS and only CSS to achieve the desired rendering of multitrack video.
>>>>> Any
>>>>> default rendering we could provide is unlikely to fit well enough in
>>>>> with
>>>>> the overall design of the page that people will want to use it.
>>>>> Some samples Silvia collected in
>>>>> <http://www.w3.org/WAI/PF/HTML/wiki/Media_Multitrack_Media_Rendering>
>>>>> demonstrate quite clearly IMO the variety of styles we can expect to
>>>>> see.
>>>>>> I guess what I am saying is that Option (1) in the wiki write-up
>>>>>> should
>>>>>> be supported in order to provide support for adaptive streaming. The
>>>>>> questions are:
>>>>>> (1) whether this should be the only way to declare such
>>>>>> additional/alternative tracks or whether an HTML markup way is also
>>>>>> required
>>>>>> (and I think that it is)
>>>>> I don't see how this approach could give us the flexibility in styling
>>>>> that is necessary. How do you envisage getting a visual end result
>>>>> similar
>>>>> to <http://www.w3.org/WAI/PF/HTML/wiki/File:Transhud.jpg> (note the
>>>>> fancy
>>>>> borders around the overlayed video) using a manifest approach?
>>>> The exercise isn't really finished yet. What I am trying to figure out
>>>> is
>>>> whether there are common patterns. Of course we need to also allow for
>>>> the
>>>> author to change and adapt the styling for their site. But I believe we
>>>> also
>>>> need a default display mechanism just like the controls. That should not
>>>> be
>>>> fancy.
>>>> I think it is early to draw conclusions, but we seem to see two
>>>> fundamentally different layouts: pip (picture-in-picture) and
>>>> side-by-side.
>>>> The default should be without any fancy borders, just thrown either on
>>>> top
>>>> of the main video or tiled. Either should have a single control. That's
>>>> all
>>>> I can tell from preliminary looks.
>>> Of course we should continue looking into this, but at this point I can't
>>> see good solutions for default rendering.
>>> Picture-in-picure: If the page hasn't provided for it, then default
>>> rendering would be limited to a box within the main video's content box,
>>> without borders or other styling. At best it could be draggable within
>>> the
>>> video content box, but that would interfere with drag-and-drop and such,
>>> so
>>> I'm not sure.
>>> Side-by-side: If the page hasn't provided for it, the best we could do is
>>> to
>>> tile multiple videos inside the same <video> element, growing the size of
>>> it
>>> unless the size is restricted by CSS. If the videos are of different
>>> sizes
>>> things become messy. No borders or padding to separate the videos.
>>> What it boils down to is how we imagine styling should work. I think that
>>> the kind of default rendering I have outlined above is not good enough to
>>> spend time on. Conversely, improving on it would require CSS extensions
>>> to
>>> achieve things that are already trivially possible if the author uses
>>> multiple <video> elements, styling them with plain CSS.
>> There is no question that a DOM API needs to be available for the
>> styling. I am not sure which markup approach is most appropriate for
>> this, so I'm not going to discuss this here.
>> However, I am concerned if we say that CSS and JavaScript are the only
>> way to get access to and display the additional tracks. I am mostly
>> concerned because it creates a barrier towards broad uptake of
>> multitrack video. A barrier that may inhibit many people from making
>> use of it. A barrier that may stop people in particular from
>> publishing audio descriptions and sign language tracks even if they
>> have them available, simply because it's too difficult to publish them
>> if you have to write your own JavaScript and CSS code to display them.
>> I think we absolutely should provide a default display, even if it is
>> crude and not pretty (most default player controls of browsers aren't
>> pretty and yet lots of content is being published without custom
>> controls).
> Not requiring JavaScript is one thing, and I think it would be nice to avoid
> it given that it's not entirely trivial to wait for the right event, go
> through list of available tracks and do something with them. In other words,
> some kind of declarative mechanism to direct the output of multitrack video
> to another <video> element might make sense.
> If people don't want something fancy, they could just use completely
> unstyled <video> elements, so I'm not sure how CSS would be *required*, it's
> just what you have to do if you want anything except side-by-side video.
>> Another reason for having a default display is when you go
>> full-screen. It would be hard if not impossible to create the layouts
>> for fullscreen display for multitrack video without the browser taking
>> care of it.
> I think something like <https://wiki.mozilla.org/Gecko:FullScreenAPI>
> handles this quite nicely by allowing any element to go fullscreen, so just
> make the common parent of all your <video> elements fullscreen.

That is possible, but seems counter-intuitive, when already there are
right-click interfaces to video that allow you to go fullscreen.

>> To make it simple, I think we don't need fancy borders around videos
>> that are shown picture-in-picture. The sheer matter that they are
>> moving pictures makes the eye recognize that there is a different
>> video happening inside. Maybe,it could just have a single pixel black
>> and a single pixel white border - that would probably satisfy all
>> contrast needs. But definitely nothing fancy.
> The samples you collected and random searches on Google images seems to
> indicate that more often than not, something more fancy than 1px solid
> black/white is used.

If I had collected user interfaces for controls, I would have ended up
with fancy ones, too. I wasn't actually looking very hard and took the
first ones that showed the principle. I don't believe in the necessity
for excessive styling here, but was only looking out for a common
approach to display multitrack video. We definitely need to find more
examples though.

>> The idea of making the pip videos movable is a good one. I'm sure
>> conflicts with other drag and dop / mouse operations can be resolved.
>> It probably just need to sit at a certain zindex level and then people
>> can style around that.
> Just use CSS as usual :)
>> I have been thinking lots about side-by-side video and how to display
>> them.
>> If they are separate boxes on the Web page, that makes it difficult to
>> have a single @controls to navigate across them. I think the
>> complexity of multiple @controls may be too hard to deal with. Thus,
>> right now I tend towards the idea that they should all be displayed in
>> a single viewport. If a side-by-side display is requested and we have
>> two video tracks, then they would sit next to each other with
>> letterboxing on top and bottom. That would also provide space for
>> captions to be moved into.
>> When we have side-by-side and 3 video tracks, I would think there's
>> two next to each other, one centered below them and pillarboxing
>> around it.
>> When we have side-by-side and 4 tracks, they would get tiled, etc.
>> If the sizes don't match, appropriate letter- and pillarboxing would be
>> used.
>> Scaling would need to be within the viewport given for the main video
>> - either through CSS or through the width/height of the main video
>> track. This is the base viewport size and within that the tiling would
>> happen by subdividing the viewport size into equal size rows and
>> columns and adequately letter- / pillarboxing. That should also allow
>> to reuse most of the scaling code that is already in use for the
>> single-track videos.
> That would be possible, but my main question is "is it worth it"? Also, what
> would happen to .videoWidth and .videoHeight?

The biggest problem I have with separate video elements displaying the
multitrack (virtual) video resource is that of the controls and that
they are not obviously linked to each other, visually, other than if
the user decides to make it that way. So, we may end up with multiple
video elements each with @controls, but only one of them is able to
introduce and manage the @controls activities. I find such a situation
very disturbing.

If instead we can have the connection between the elements, but a
display that is within one video viewport, then I would favor such a

Instead, when they are all inside the video viewport, that makes it
nice and easy. The .videoWidth and .videoHeight would continue to
display the intrinsic width and height of the element, so in the case
of 4 video tracks displayed in parallel, it might end up to be a
quarter of the video viewport size. Das that create further problems?


> About the controls, in my preferred solution of multiple <video> elements,
> if people don't want conflicting controls, they should leave out the
> controls attribute on all <video>s but one.
>> If people want more fancy displays than that, the can implement them
>> with JavaScript and CSS. Then there could also be spaces defined
>> outside the viewport into which they can be rendered but fall back to
>> the viewport-only display when you go fullscreen.
> I would much prefer to start with a solution that is flexible, reuses
> existing technology (CSS) and will cover a large percentage of the use
> cases. If it turns out that people just can't deal with it and want
> something simpler (and CSS/JavaScript libraries don't fill that need) then
> we can try simplifying things further, i.e. complicating things for
> implementors. (That would be me, see how poorly I disguise my laziness?)
> --
> Philip Jägenstedt
> Core Developer
> Opera Software
Received on Monday, 21 February 2011 14:14:00 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:17:22 GMT