Re: Byte ranges and time ranges from Silvia Pfeiffer on 2008-11-13 (public-media-fragment@w3.org from November 2008)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Thu, 13 Nov 2008 17:17:09 +1100
To: "Yves Lafon" <ylafon@w3.org>
Cc: "Media Fragment" <public-media-fragment@w3.org>, "Chris Double" <chris.double@gmail.com>
Message-ID: <2c0e02830811122217u47bf1e90gc21e579b7660fc84@mail.gmail.com>
Hi Yves,

I thought that would be of interest to you. :-)

I have therefore cc-ed Chris Double who implemented this for Mozilla,
so he can clarify where I am explaining it wrongly.

On Thu, Nov 13, 2008 at 12:00 PM, Yves Lafon <ylafon@w3.org> wrote:
> On Tue, 11 Nov 2008, Silvia Pfeiffer wrote:
>
>> document.getElementsByTagName("video")[0].currentTime=30
>>
>> Assuming that part of the file had not been downloaded, what happens
>> is as follows:
>>
>> * the request goes from the browser to liboggplay
>> * liboggplay makes an educated guess at the byte offset that relates
>> to this time offset based on the file size and the average bitrate of
>> the file (which were received as first information about the video)
>
> So the first handshake is there to get the beginning of the audio resource,
> to figure out the average bitrate and the size. Is it done using a range
> request, or does it start the download of the whole resource then subsequent
> requests will get content directly form the browser cache ?


It's actually a bit more complicated than that. When Chris comes
across a video element, the first thing he does for the element is to
get the duration (probably both in bytes and time, which will give him
the average bitrate). For Theora, this may in itself require several
roundtrips until he has reached the end of the file. This information
is then stored for the video and used in subsequent seek requests.
Chris is indeed asking for the video element to have a "size"
attribute such as to avoid this first seeking procedure.

All seeks are done using a byte range request - Chris doesn't download
the whole resource in one request ever (I think). What happens after
the initial request is that no video data is in the browser cache. If
the user hits play at the beginning of the file, firefox asks for a
byterange at the beginning of the file and buffers that, and continues
buffering as long as the user is playing (I think). However, if
instead the user clicks on a transport bar at a time offset, the
browser changes the currentTime attribute of the video tag, which in
turn makes the media subsystem go fetch a byterange that is estimated
to start at that time offset. And this is where the seeking algorithm
starts as described.


>> * liboggplay hands back the byte ranges to the browser
>> * the browser makes a read request on these byte ranges via http
>> (using these functions
>>
>> http://hg.mozilla.org/mozilla-central/file/5dfdad637696/content/media/video/src/nsMediaStream.cpp)
>> * the server returns those byte ranges and the browser hands them back
>> to liboggplay, which determines from the received granulepositions
>> what time they relate to
>> * if the requested time is not hit, liboggplay makes a better estimate
>> on byte ranges and another http byte range request is sent, etc. until
>> the right byte ranges are returned
>>
>> This is amazingly the exact same seeking algorithm that is used by
>> media players that seek to time offsets in Ogg file. The only
>> difference is that the seeking algorithm is now implemented over HTTP
>> rather than local file I/O. If the guesses are good, less than 10
>> round trips are necessary, Chris says. He also says that the delay
>> introduced through these roundtrips are barely visible in the browser.
>> He has tested with Wikimedia and other content and it works reliably.
>
> No, that's not exactly the same algorithm. One thing you get for granted is
> when you open a file, you get a file descriptor and een if you overwrite the
> file, the fd still points to the old content (unless the disk space is
> overwritten, or the inode table or whatever might interfere).
> In HTTP you don't have that. So basically you need to send a conditionnal
> request for every subsequent interaction with the server to make it work
> reliably. Is it the case there?

I'm not sure if Chris is regarding this case. Generally I would expect
that it is assumed that the media resource continues existing on the
server. Usually video is quite a persistent resource. I would however
think that if there is an error in any of the HTTP responses, the
browser will indicate that to the user. Maybe Chris can clarify.


>> Chris also says that if he can get a cooperative server which can do
>> the time-byte-mapping that we are discussion, he'd rather use the
>> seeking support on the server. However, I find it amazing that it is
>> working so well even without such server support!
>
> Yes, finding a solution _now_ doesn't mean that the solution is optimal :)

No, but it works. :)


>> I think we can draw some conclusions from this:
>>
>> * when loading up a video, a couple of roundtrips don't make much of a
>> difference; thinking about this further, I actually think this is the
>> case because we are used to Web pages that take a long time to load
>> because they have to download many resources and cannot get them all
>> in parallel; we are also used to videos sitting in the browser
>> buffering because the bandwidth is not big enough; in comparison two
>> roundtrips for video are really nothing
>
> Well, there is work to deliver HTTP over SCTP, allowing for far more
> parallelism in resource fetching,

Has HTTP over SCTP really left the lab yet? That would be good news.


> also what seemed "good enough latency" ten
> years ago is now unacceptable.

Yes, but we are not talking ten years ago - we are talking about an
implementation that was done this year and is working acceptably under
current expectations of users.


> Note also that it depends on a lot of things,
> including the network latency and speed.

... which for people out of Australia and New Zealand (that's where
Chris is from) is generally much worse than for people from Europe or
the US. So, if Chris says it's acceptable, I will believe it.


>> * asking for byte ranges alone can work.
>
> As well as defining sub-URIs for each seconds and retreiving an index of all
> seconds->links relations :)

In fact, it has been proposed that the server share a binary index
representing the keyframe structure of the media file with the client,
which will then enable the client to map time to bytes itself and
enable proxies to continue working as they are. It's not such a stupid
idea.

Cheers,
Silvia.
Received on Thursday, 13 November 2008 06:17:46 UTC