Re: Buffered bytes for media elements from Ian Hickson on 2008-11-15 (public-html@w3.org from November 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Sat, 15 Nov 2008 19:34:33 +0000 (UTC)
To: John Harding <jharding@google.com>, Eric Carlson <eric.carlson@apple.com>, Maciej Stachowiak <mjs@apple.com>, Jim Jewett <jimjjewett@gmail.com>, Dave Singer <singer@apple.com>, Justin James <j_james@mindspring.com>, Philip Jägenstedt <philipj@opera.com>
Cc: HTML WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0811151919581.1237@hixie.dreamhostps.com>
Based on the comments below I have removed the bufferedBytes feature. I 
have also made it abundantly clear that browsers are required to determine 
the actual time available for bufferedTime, even when the stream makes it 
very difficult to determine (for example, multiple dramatically different 
variable-rate Ogg streams concatenated back to back).


On Wed, 15 Oct 2008, John Harding wrote:
>
> It is common for web sites displaying video (such as YouTube) to control 
> the buffering and playback of video in order to optimize the user 
> experience. One example is deciding when sufficient data has been 
> buffered to begin playback - while the spec currently include the 
> "canPlayThrough" event, this isn't really flexible enough for all use 
> cases.  Users often do not watch entire videos, so delaying playback 
> until complete playthrough is possible is too much of a delay.  It's 
> often appropriate to play *some* video as soon as possible, and modify 
> buffering behavior during playback as the user's intentions become more 
> clear.
>
> Another example is that web sites may have multiple versions of videos, 
> and want to be able to make their own determination of when to switch 
> from one to another, vs. leaving that up to the user agent.
> 
> While the buffered time ranges provide an approximation to this, they 
> can be very far off the mark for some types of content.  Similarly, 
> there are scenarios where buffered byte counts are also incorrect, but 
> the overwhelming majority of the time, video will be a static file 
> served by a standard server, such that it's trivial for the agent to 
> provide that data.

On Thu, 16 Oct 2008, Eric Carlson wrote:
>
> I don't follow your logic. How can knowing the number of *bytes* 
> available allow you to make a good decision about when sufficient data 
> is available to begin playback when you don't know the bit-rate of the 
> media file? In other words, how can you know when it is safe to begin 
> playback when you have N bytes available but you don't know if playing 
> the media file requires 10 K per second or 10 Megs per second?
> 
> I would think that just monitoring the number of seconds of media 
> available as time passes would give you a *more* accurate picture about 
> when it makes sense to ask the server for a lower bit-rate version - "oh 
> oh - this user is only buffering 1 second of media every 5 seconds, time 
> to switch to the smaller version".
> 
> I may be missing something obvious, but I can't see how having 
> information about the number bytes of a time-based resource is useful. 
> Can you spell out the use case(s) for these properties in more detail?

On Fri, 17 Oct 2008, Jim Jewett wrote:
> 
> I'm speculating here, but
> 
> (a) Many videos are really a sort of slideshow, without motion.  Even 
> those with motion often begin with a credits screen that stays static 
> for several seconds.
> 
> If you want to have x seconds buffered, then you want to buffer x 
> seconds of changing video, not x seconds of the highly-compressible 
> intro.
> 
> On a per-video basis, you could just say "well, we need 3x bytes on this 
> one because of the way it starts."  But if you want to use more generic 
> scripts, you need some way of telling whether or not those x seconds 
> were data-heavy.  Specifying bytes rather than seconds is a fairly good 
> proxy.
> 
> (b) If you're thinking of switching to a higher or lower resolution, the 
> bandwidth limits are normally expressed in (something equivalent to) 
> bytes/second.  You could do the equivalent with seconds, plus a timer, 
> plus knowledge of the specific video, but ... it may be simpler to track 
> bytes.

On Fri, 17 Oct 2008, Eric Carlson wrote:
>
> This is exactly the situation the "readyState" property is supposed to 
> handle, "HAVE_ENOUGH_DATA" state means the playback can begin without 
> having to pause to re-buffer. The media engine is really the only thing 
> in a position to make an accurate decision about this, because the 
> answer depends on the overall data rate of the media file (including 
> local highs and lows) and the amount of bandwidth available between the 
> client machine and the server with the media file.
>
> Even if you want to make these calculations yourself, just knowing the 
> number of bytes buffered doesn't give you enough information because you 
> also need to know the encoding rate of the media file.

On Mon, 20 Oct 2008, Dave Singer wrote:
>
> Knowing bytes really doesn't help you unless you know how relevant those 
> bytes are, also.  Are they bytes 'immediately in front of the playhead'?  
> You just don't know.  Also, are they 'dense'?  Maybe we have 200 kb 
> buffered -- but it's all bytes of the video and none of the audio.  Or 
> it's the first 3 seconds, then there is a 10-second gap, and then 4 
> seconds more.  Or, or, or...
> 
> We really need to define questions that have a clear semantic as to what 
> you are trying to do, I think, that can be correctly and helpfully 
> answered by most or all media systems and for most or all delivery 
> technologies.
> 
> Consider a system which is playing directly from a DVB stream (e.g. you 
> have a digital radio receiver in your hand-held device).  You don't need 
> more than a frame or two buffered, as delivery is exactly real-time and 
> jitter-free.

On Sun, 19 Oct 2008, Justin James wrote:
>
> Why can't we just have both?

On Mon, 20 Oct 2008, Dave Singer wrote:
> 
> Because we don't want parts of the specification that have so many 
> holes?
> 
> Heres another one:  if I have loaded 20K of a file, but 10K of that is 
> not actual media data (maybe it's metadata, maybe not used), do I report 
> 10K or 20K as the buffered bytes?  If I want to know how much memory is 
> being used for buffering, 20K is the right answer.  If I want to know 
> how much data is relevant to my playback, 10K is the right answer...

On Mon, 20 Oct 2008, Justin James wrote:
> 
> I agree that all of the questions we've seen on this list so far make it 
> look like using bytes buffered is not a good design decision. I 
> certainly see no reason to use it! At the same time, I don't see how 
> exposing the information at the level that we are concerned about would 
> significantly harm the spec, so long as we include the significantly 
> much more useful "time buffered" number as well. It's like offering the 
> ability to change the direction of text, without it being tied to the 
> language being used... sure, there should never be a good reason for me 
> to have the Roman alphabet going right-to-left, but that doesn't mean 
> that 1 site in a zillion would have a good usage for it, and including 
> the functionality is fairly easy (much easier for bytes buffered that 
> RTL text...). :)

On Mon, 20 Oct 2008, Philip Jägenstedt wrote:
> 
> I would like to see either a clear use case for bufferedBytes or for it 
> to be removed. At the very least, it's a non-zero amount of work to 
> implement and it would be nice to know what it is intended for. The 
> bufferingRate should tell you the current download rate and the total 
> amount downloaded would also be given in progress events (although as a 
> sum, not ranges). Perhaps if the actual problem that this is supposed to 
> address is made more clear, a better solution can be found.

On Tue, 21 Oct 2008, John Harding wrote:
>
> The primary purpose, from my point of view, was to enable the page 
> author to make its own determinations about download progress and when 
> playback can start, independent of the user agent.  Yes, this requires 
> knowledge about the actual contents of the video, but in a scenario such 
> as YouTube, where the page author also controls production of the video, 
> this is quite reasonable.  As I mentioned earlier, the user agent's 
> determination of "can play through" does not always correspond to the 
> ideal time for playback to begin.
> 
> However, it's unlikely that this would be done on the basis of specific, 
> non-contiguous byte ranges - if there's a reasonable mechanism for the 
> page author to track the total download progress (in addition to the 
> bufferingRate), that would be sufficient.  The current spec I'm looking 
> at doesn't provide any detail about progress events, either frequency or 
> what data is available with them.

On Fri, 24 Oct 2008, Philip Jägenstedt wrote:
>
> I assume that progress events will eventually refer to 
> http://www.w3.org/TR/progress-events/
> 
> These events have lengthComputable/loaded/total fields by which you can 
> track the progress of the download.
> 
> Such events will be fired "every 350ms (Â±200ms) or for every byte 
> received, whichever is least frequent" according to the current HTML5 
> draft.

On Tue, 28 Oct 2008, John Harding wrote:
>
> If that's the case, then bufferedBytes does seem somewhat redundant - it 
> would be difficult for a page author to extract much meaning from the 
> additional precision in the byte ranges vs. the aggregate loaded bytes 
> from the progress event.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Saturday, 15 November 2008 19:35:10 UTC