[whatwg] Video feedback from Mark Watson on 2011-06-20 (public-whatwg-archive@w3.org from June 2011)

From: Mark Watson <watsonm@netflix.com>
Date: Mon, 20 Jun 2011 08:43:00 -0700
Message-ID: <F9BF92B2-D121-4C37-8576-ECF840E594F0@netflix.com>
On Jun 20, 2011, at 5:28 PM, Silvia Pfeiffer wrote:

> On Tue, Jun 21, 2011 at 12:07 AM, Mark Watson <watsonm at netflix.com> wrote:
>> 
>> On Jun 20, 2011, at 11:52 AM, Silvia Pfeiffer wrote:
>> 
>>> On Mon, Jun 20, 2011 at 7:31 PM, Mark Watson <watsonm at netflix.com> wrote:
>>>>> 
>>>>>> The TrackList object has an onchanged event, which I assumed would fire when
>>>>>> any of the information in the TrackList changes (e.g. tracks added or
>>>>>> removed). But actually the spec doesn't state when this event fires (as far
>>>>>> as I could tell - unless it is implied by some general definition of events
>>>>>> called onchanged).
>>>>>> 
>>>>>> Should there be some clarification here ?
>>>>> 
>>>>> I understood that to relate to a change of cues only, since it is on
>>>>> the tracklist. I.e. it's an aggregate event from the oncuechange event
>>>>> of a cue inside the track. I didn't think it would relate to a change
>>>>> of existence of that track.
>>>>> 
>>>>> Note that the even is attached to the TrackList, not the TrackList[],
>>>>> so it cannot be raised when a track is added or removed, only when
>>>>> something inside the TrackList changes.
>>>> 
>>>> Are we talking about the same thing ? There is no TrackList array and
>>>> TrackList is only used for audio/video, not text, so I don't understand the
>>>> comment about cues.
>>>> I'm talking
>>>> about http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#tracklist which
>>>> is the base class for MultipleTrackList and ExclusiveTrackList used to
>>>> represent all the audio and video tracks (respectively). One instance of the
>>>> object represents all the tracks, so I would assume that a change in the
>>>> number of tracks is a change to this object.
>>> 
>>> Ah yes, you're right: I got confused.
>>> 
>>> It says "Whenever the selected track is changed, the user agent must
>>> queue a task to fire a simple event named change at the
>>> MultipleTrackList object." This means it fires when the selectedIndex
>>> is changed, i.e. the user chooses a different track for rendering. I
>>> still don't think it relates to changes in the composition of tracks
>>> of a resource. That should be something different and should probably
>>> be on the MediaElement and not on the track list to also cover changes
>>> in text tracks.
>> 
>> Fair enough.
>> 
>>> 
>>> 
>>>>>> Also, as Eric (C) pointed out, one of the things which can change is which
>>>>>> of several available versions of the content is being rendered (for adaptive
>>>>>> bitrate cases). This doesn't necessarily change any of the metadata
>>>>>> currently exposed on the video element, but nevertheless it's information
>>>>>> that the application may need. It would be nice to expose some kind of
>>>>>> identifier for the currently rendered stream and have an event when this
>>>>>> changes. I think that a stream-format-supplied identifier would be
>>>>>> sufficient.
>>>>> 
>>>>> I don't know about the adaptive streaming situation. I think that is
>>>>> more about statistics/metrics rather than about change of resource.
>>>>> All the alternatives in an adaptive streaming "resource" should
>>>>> provide the same number of tracks and the same video dimensions, just
>>>>> at different bitrate/quality, no?
>>>> 
>>>> I think of the different adaptive versions on a per-track basis (i.e. the
>>>> alternatives are *within* each track), not a bunch of alternatives each of
>>>> which contains several tracks. Both are possible, of course.
>>>> 
>>>> It's certainly possible (indeed common) for different bitrate video
>>>> encodings to have different resolutions - there are video encoding reasons
>>>> to do this. Of course the aspect ratio should not change and nor should the
>>>> dimensions on the screen (both would be a little peculiar for the user).
>>>> 
>>>> Now, the videoWidth and videoHeight attributes of HTMLVideoElement are not
>>>> the same as the resolution (for a start, they are in CSS pixels, which are
>>>> square), but I think it quite likely that if the resolution of the video
>>>> changes than the videoWidth and videoHeight might change. I'd be interested
>>>> to hear how existing implementations relate resolution to videoWidth and
>>>> videoHeight.
>>> 
>>> Well, if videoWidth and videoHeight change and no dimensions on the
>>> video are provided through CSS, then surely the video will change size
>>> and the display will shrink. That would be a terrible user experience.
>>> For that reason I would suggest that such a change not be made in
>>> alternative adaptive streams.
>> 
>> That seems backwards to me! I would say "For that reason I would suggest that dimensions are provided through CSS or through the width and height attributes."
>> 
>> Alternatively, we change the specification of the video element to accommodate this aspect of adaptive streaming (for example, the videoWidth and videoHeight could be defined to be based on the highest resolution bitrate being considered.)
>> 
>> There are good video encoding reasons for different bitrates to be encoded at different resolutions which are far more important than any reasons not to do either of the above.
>> 
>>> 
>>> 
>>>>> Different video dimensions should be
>>>>> provided through the <source> element and @media attribute, but within
>>>>> an adaptive stream, the alternatives should be consistent because the
>>>>> target device won't change. I guess this is a discussion for another
>>>>> thread... :-)
>>>> 
>>>> Possibly ;-) The device knows much better than the page author what
>>>> capabilities it has and so what resolutions are suitable for the device. So
>>>> it is better to provide all the alternatives as a single resource and have
>>>> the device work out which subset it can support. Or at least, the list
>>>> should be provided all at the same level - there is no rationale for a
>>>> hierarchy of alternatives.
>>> 
>>> The way in which HTML deals with different devices and their different
>>> capabilities is through media queries. As a author you provide your
>>> content with different versions of media-dependent style sheets and
>>> content, so that when you view the page with a different device, the
>>> capabilities of the device select the right style sheet and content
>>> for display on that device. Opera has an example on how to use this
>>> here: http://dev.opera.com/articles/view/everything-you-need-to-know-about-html5-video-and-audio/
>>> (search for "Media Query").
>>> 
>>> I believe that this mechanism should also work for adaptive streaming,
>>> such that you provide multiple alternative media resources through the
>>> <source> element, each of which has a @media attribute that says what
>>> device capabilities that particular resource is adequate for. Except
>>> that the "media resource" provides alternative bitrate files for that
>>> case. I do not see a need to move this functionality into the adaptive
>>> streaming file.
>>> 
>>> Nice to get started on this discussion about adaptive streaming. ;-)
>> 
>> Indeed.
>> 
>> So, what I said above is that there is no rationale for a hierarchy. What I mean is that if I have ten encodings of a video, I should just list those ten in a flat list somewhere, annotated with their properties. The device knows best what it can support, what's appropriate etc. The key point is that I get this list without having to download the actual media.
>> 
>> It's not a good idea to split up that list of ten into sub-lists "intended" for different devices, because then I am making assumptions about what kinds of devices there are and what they need. But it is the devices that know best. In DASH people often proposed splitting the list into "Handheld", "SD" and "HD", but then there are devices that happily cope with resolutions that span those categories. Consider a few such devices and you find you need finer granularity. Since we're talking about a tiny amount of descriptive metadata, its much simpler just to list them all in one flat list.
>> 
>> So then there is the question of where this "flat" list should be: in the HTML or in an adaptive streaming manifest ?
>> 
>> Here we have a genuine functional overlap. HTML provides information which drives a resource selection function which considers many things, such as container types, codecs and everything which can be expressed in Media Queries. Adaptive streaming manifests also provide the same information for the same "selection" purpose plus additional information supporting adaptive streaming as well. Much of the information which drives selection is also needed for adaptation. Also there is no strict split between "adaptation" and "selection": the capabilities of clients may differ in terms of what they can seamlessly switch and what they can't.
>> 
>> So, in integrating HTML and adaptive streaming we have to define the interactions between these overlapping selection functions - we cannot get away from this functional overlap.
>> 
>> I think it would be a bad idea to try and re-invent adaptive streaming in HTML itself. A lot of work has been done on this over the past few years and anything HTML starts from scratch will be way behind. For my part I would like to see adaptive streaming defined in a way which is independent of the presentation layer technology, so adaptive streams can be constructed which play both in HTML and in other places.
>> 
>> The consequence is that we should not assume that an "adaptive stream" (for want of a better term) will be split up into multiple sources when used in HTML. Of course people can do this: if you want to provide 4 separate adaptive streams and use media queries to have the client select which one to play, that's fine, but we must also consider the case where everything is in one manifest.
> 
> Note that this is not what I suggested. I just believe we cab use both
> approaches: Media Queries and adaptive streams. With media queries we
> can more easily pre-select the first stream that is picked to be more
> appropriate for the device that is being used, and we can make more
> appropriate alternative streams. For example, if we use markup as
> follows:
> 
> <video controls>
>  <source src="manifest1_ogv" media="min-device-height:720px" type="video/ogg">
>  <source src="manifest2_ogv" media="max-device-height:720px" type="video/ogg">
>  <source src="manifest1_mp4" media="min-device-height:720px" type="video/mp4">
>  <source src="manifest2_mp4" media="max-device-height:720px" type="video/mp4">
> </video>
> 
> then we can have Ogg Theora or MP4 videos of different bandwidth and
> screen size in manifest1 to manifest2 and the device will itself
> decide which of the two first its screen height and then stick with
> the streams in that manifest.

Right, and this is exactly the problem. What about a device where it's appropriate to range from 420px to 1080px ?

> If a group of files are from the start
> excluded from being useful for a particular device because they are
> unfit, it will make the switching much faster, so this is a good
> thing.

I'm not sure how it will make switching faster.

Certainly, the first thing a client should do with a manifest is exclude the options which are not useful for that device, based on its capabilities. Whether that is done in HTML based on Media Queries, or within the adaptive streaming client based on information in the manifest makes no difference in terms of speed, except that your manifests above are smaller than a manifest containing all the streams. But manifests are anyway pretty small.

Having separate sources for different container formats makes more sense, because it's unlikely anyone would ever support adaptive switching between container formats (though its certainly logically meaningful and technically possible). But still if I want to create an adaptive stream that is useful also in non-HTML contexts I would put all the container formats into one manifest. 

...Mark

> 
> Cheers,
> Silvia.
>
Received on Monday, 20 June 2011 08:43:00 UTC