Re: <video> readyState oddities from Philip Jägenstedt on 2011-02-26 (public-html@w3.org from February 2011)

From: Philip Jägenstedt <philipj@opera.com>
Date: Sat, 26 Feb 2011 14:46:50 +0100
To: "Ian Hickson" <ian@hixie.ch>
Cc: public-html@w3.org
Message-ID: <op.vrii8czgsr6mfa@nog>
On Sat, 26 Feb 2011 00:40:02 +0100, Ian Hickson <ian@hixie.ch> wrote:

> On Fri, 25 Feb 2011, Philip Jägenstedt wrote:
>>
>> Because of the definition of "potentially playing", currentTime cannot
>> increase once readyState has falled back to HAVE_CURRENT_DATA, so I
>> assume you meant HAVE_FUTURE_DATA in the above?
>
> Oops. The definition of "potentially playing" is wrong. Fixed.
>
> The spec was indeed self-contradictory before in numerous ways because of
> this mistake. My bad. (For example, 'waiting' would never fire per the
> previous definition.)

Including HAVE_CURRENT_DATA in the definition of "potentially playing"  
seems like a very strange solution to the inconsistencies, IMO it just  
adds to them. HAVE_CURRENT_DATA is defined as a state where "not enough  
data is available that the user agent could successfully advance the  
current playback position in the direction of playback at all without  
immediately reverting to the HAVE_METADATA state", in other words where  
it's not possible to play, unless one considers increasing currentTime by  
an imaginary epsilon and reverting to HAVE_METADATA playing. Calling that  
state "potentially playing" is just weird.

>> If buffered is [0,10] then one would reasonably expect currentTime to
>> increase monotonically until it is exactly 10. Where else would it stop,
>> 10.0000001?
>
> It would stop at the start time of the next frame of video. Assuming for
> the sake of argument a video with ten frames per second, the 100th frame
> would start at 9.9s and end at 10.0s minus epsilon. The 101th frame would
> be from 10.0s to 10.1s, and so it would stop at 10.0s.
>
> Actually I guess I'm arguing that it's exclusive, not inclusive, because
> t-epsilon < t but will round to t in the API. So it's inclusive in
> principle but in practice it's exclusive...

It looks like you're assuming that the start time of a video frame is  
inclusive but the end time is exclusive. If that's the case, then surely  
at currentTime==duration there should be no video frame showing, as it's  
epsilon larger than the end time of the last frame. You could claim that  
the last frame just keeps showing because there's nothing else to show,  
but that would also mean that if one seeks to duration from some other  
point in the video, the frame from that other point would keep showing  
indefinitely, right?

I'd much prefer if we kept the theoretical model and definitions simple by  
not assuming that there's such a thing as an imaginary epsilon  
differentiating two times that are indistinguishable both to the browser  
internals and via the DOM API.

>> > It seems extremely different in one key respect: when you have ended,
>> > the frame you _want_ to be rendering is the last one, but when you
>> > have run out of data, the frame you _want_ to be rendering is the next
>> > one.
>>
>> Surely the difference between HAVE_METADATA and HAVE_CURRENT_DATA should
>> correspond to whether or not currentTime is in the buffered ranges, not
>> be special-cased for currentTime==duration?
>
> I don't understand the difference.

I'm not absolutely sure what your position is or what the spec is trying  
to say, but it seems like you think that when playback stops due to  
waiting for the network, readyState should revert to HAVE_METADATA, while  
at the end of the resource it should be HAVE_CURRENT_DATA. This is  
inconsistent as in both cases currentTime would be in the buffered ranges  
(exactly equal to one of the range ends) and from a decoding perspective  
there's really no difference between two states.

[snip]

I'll try to clarify a bit on the problems with readyState and the related  
events by comparing what I *think* you are trying to say in the spec with  
what I think actually makes sense and want to implement. Please correct me  
where I've misunderstood the spec or your arguments in this thread.

1. initial loading of a resource

Hixie/spec: Initially, readyState is HAVE_NOTHING. When the duration and  
the intrinsic size of the video is determined, readyState changes to  
HAVE_METADATA. When the first frame has been decoded, readyState changes  
to HAVE_CURRENT_DATA. When the second frame has been decoded, readyState  
changes to HAVE_FUTURE_DATA. When the buffered data and buffering speed is  
such that one could probably play through to the end without pausing,  
readyState changes to HAVE_ENOUGH_DATA.

Me: Here, the difference between HAVE_METADATA, HAVE_CURRENT_DATA and  
HAVE_FUTURE_DATA is clear, but extremely narrow. Since the events are  
fired asynchronously, by the time the loadedmetadata event handler is run  
it's very likely that at least one frame has been decoded and readyState >  
HAVE_METADATA. Since scripts that try to paint the first frame of the  
video to <canvas> in the loadedmetadata event handler will work most of  
the time and only fail if the network is very slow, implementations would  
do best to pin readyState to HAVE_NOTHING until the first frame is  
decoded, then skipping directly to HAVE_CURRENT_DATA, firing the  
loadedmetadata and loadeddata events.

Another issue here is that the canplay event fired when readyState reaches  
HAVE_FUTURE_DATA is completely useless. While one *can* play one frame at  
that point doing so and then pausing is not a good user experience. I've  
filed <http://www.w3.org/Bugs/Public/show_bug.cgi?id=12195> for this.

2. playback stops due to reaching the end of resource

Hixie/spec: readyState is HAVE_ENOUGH_DATA when approaching the end. As  
the last frame is reached readyState drops to HAVE_CURRENT_DATA.

Me: I agree with this, if I've interpreted the spec correctly.

3. playback stops due to waiting for the network

Hixie/spec: Before the buffered data runs out, readyState is  
HAVE_FUTURE_DATA (or possibly HAVE_ENOUGH_DATA if the original guess for  
when to transition to HAVE_ENOUGH_DATA was wrong). When the last available  
frames is displayed, readyState drops to HAVE_CURRENT_DATA. Then,  
currentTime increases by epsilon beyond the last frame so that readyState  
drops to HAVE_METADATA, but the difference can not be seen in currentTime.  
When data again becomes available over the network, one will again  
transition through HAVE_CURRENT_DATA and HAVE_FUTURE_DATA like for the  
intial loading of a resource, with only one frame setting them apart.

Me: Again, the difference between HAVE_CURRENT_DATA and HAVE_METADATA is  
extremely narrow and hardly useful. I'd like readyState to stay at  
HAVE_FUTURE_DATA until the end of the last available frame is reached, at  
which point it drops to HAVE_CURRENT_DATA and stays there. When it should  
change back to HAVE_FUTURE_DATA depends on  
<http://www.w3.org/Bugs/Public/show_bug.cgi?id=12195>.

4. seeking

Hixie/spec: When seeking to an unbuffered position, readyState drops to  
HAVE_METADATA. Then, one transitions through HAVE_CURRENT_DATA and  
HAVE_FUTURE_DATA as data becomes available, much like for an initial load.

Me: This seems mostly sensible. It's useful that scripts can wait for the  
loadeddata event (fired in the HAVE_METADATA => HAVE_CURRENT_DATA  
transition) to know that the current frame is available. The canplay event  
is quite useless though, see the bug I filed.

It'd be very interesting to hear from other implementors if they've  
actually bothered to try implementing what the spec says, or if they've  
found it strange and inconsistent and ignored it. From what I understand  
at least Firefox has does something different than the spec requires, but  
I don't know if it was deliberate, of course.

-- 
Philip Jägenstedt
Core Developer
Opera Software
Received on Saturday, 26 February 2011 13:47:34 UTC