[whatwg] media elements: Relative seeking

On Tue, Nov 25, 2008 at 6:58 PM, Maik Merten <maikmerten at googlemail.com> wrote:
> Dave Singer schrieb:
>> IF we are to do this, I would have thought it would be by adding units to
>> the "where to seek to" argument:
>> * go to this time in NPT (normal play time, which runs from 0 to media
>> duration)
>> * go to this SMPTE time code
>> * go by this relative distance in NPT times
>> * go to this proportional time
>> * go to this proportional byte distance
>> * go by this relative byte distance
> Hmmm... I'm in favor of of making implementations more simple (and thus more
> robust in this case), so I think inflating the number of ways of how one can
> seek may be going into the wrong direction.

I don't really see a need for anything else than minutes/seconds,
bytes, and percentages. Even in temporal URIs, we saw people prefer
using minutes/seconds over SMPTE, which would be more accurate.

>> Note that proportional distances are not well defined for some streams
>> (e.g. indefinite ones).
> Okay, this use case basically rules out just seeking by values between zero
> and one. Even with indefinite streams the user may want to e.g. jump to
> second 20 of the stream, which won't work with the proportional seeking I
> asked for.

Live streams are somewhat bad to deal with anyway, because a timeline
is badly defined on such. All you could really do is show the past and
have a "continues" pointer at the end. Most live streams (e.g. the
recent YouTube live concert) simply don't show a timeline and disallow
people to jump around in the presentation.

>> We'd have to define what bytes are counted and what aren't, especially if
>> a URL offers a set of sub-streams only some of which a client would normally
>> choose to have sent to it for playing.

No, I think that's a wrong assumption. When referring to bytes in a
media resource, one is always referring back to the one on the server.
So, number of bytes are well-defined wrt a media resource and they
cross any substream boundaries. Byte 500 may be in the video track,
while byte 501 may be part of the audio track.

What's a much larger problem is that there are file formats where
there is no linearly increasing mapping between a byte position and a
time offset. Meaning: byte 500 could relate to a video track at 3
seconds, while byte 501 could relate to an audio track at 1 second.
For some formats these don't even need to be in different tracks,
because the codec "jumps around".  So, "bytes" may indeed be a poor
indicator of timeline progress.

However, percentage should not be a problem I would think.

> I'm currently slamming a subset of the HTML5 media API onto a Java applet
> (to offer a fallback for browsers without audio/video).
> http://people.xiph.org/~maikmerten/demos/bigbuckbunny-applet-javascript.html
> Estimating the duration *does* work - poorly in this case, but still.
> Currently this applet uses the byte-position of the last byte fetched from
> the server, which isn't the correct *playback* byte-position due to not
> accounting for the bytes in the buffer (which, in this case, is 4 megabytes
> big - so this is severe). I assume that once this is corrected a
> reasonable-enough (for a status bar position slider) estimate may actually
> be possible, meaning things could just stay the way they are.

Are you using the byte position to estimate duration or are you using
the granulepos in Ogg to do this? The granulepos on the last data page
may be more accurate than simply using byte positions in Ogg. However,
it may also be more complicated and error-prone.

In any case - if you (and also Chris Double) are satisfied with the
estimates you're getting for file duration/length - I'll stop arguing
for it. It would be nice to hear some experimental evidence about how
well it's doing, e.g. for typical movie trailers, so we can lay that
argument to bed knowing we've done our homework.


Received on Tuesday, 25 November 2008 03:28:15 UTC