State transitions for media elements

<http://www.w3.org/html/wg/html5/#media>

We've been discussing these states with other 
implementors (including Philip Jägenstedt at 
Opera, for example), and we think these really 
could do with some refining.

In particular, we think that
a) the state called 'network state' is actually a 
mix of the state of the network and the state of 
the media
b) the media state needs to be described in terms 
of what can be done with the media, not what it 
has in hand, as what is in hand differs between 
various protocols;
c) the difference between whether play/pause is 
*requested* and happening needs to be clearer:
    -- in a download protocol, play can stall if data is unavailable
    -- in a streaming protocol, there is an 
initial buffering period in which play is 
requested but time is not advancing


So, here is a brief revision, obviously derived 
from the existing document but, we hope, avoiding 
these issues and supporting more (we hope, all) 
protocols:


Network state:  documents whether the network is 
being used (e.g. for a network activity indicator)

Empty:  initial state, or state when there is a 
failure that'll need some action to escape from

Idle:  the URI is known, but the UA has no need 
to use the network right now (e.g. download 
resource for which 'enough' is cached, streaming 
resource which is not active...)

Loading: the network is being used right now (you 
can show an activity indicator)

Loaded:  for a loadable resource, we've both 
loaded it all and don't intend to unload it (you 
could disconnect and walk away)

There is an event, Stalled, which is fired once 
during Loading if data doesn't seem to be 
arriving after a reasonable timeout (as now).


Media state:  documents what you can do with the 
media.  (Each state is a superset of the one 
preceding).

Empty: initial state

Metadata_loaded:  enough data has been loaded 
that a well-defined set of questions can now be 
answered as well as they ever could be (e.g. 
duration, width/height, codecs used, and so on).

Can_display (or Can_display_at_current_time; 
currently called can_display_current_frame):  the 
UA has done all it can or intends to do for the 
media resource to be displayable at the 
current_time.  For a downloadable resource, this 
means that the current video frame (if 
applicable) can be painted, at least one sample 
of audio (if applicable) played, and so on.  For 
a streaming resource, it may mean very little 
more than that if you are waiting for something 
before you displayed the media element, stop 
waiting:  it won't get any more displayable.

Can_play:  if playback were requested, the UA 
expects it would be able to actually start within 
a reasonable period and play a reasonable amount 
(before a stall, for example).  For a 
downloadable protocol, that means that at least 
some data ahead of current_time is available; 
for a streaming protocol, that if playback was 
requested, playback would start 'soon'.

Can_play_through:  if playback was requested, the 
UA is reasonably confident that it could play to 
the end without a playback stall.  (This state 
might never get entered if the network bandwidth 
is insufficient and the resource cannot be 
cached, either because of cache limitations or 
because it's a streaming service)



Play_request state:  documents what has been 
asked of the media.  We need state+events for 
this because UAs can display a play/pause 
controller that the scripts cannot 'see'.

Empty:  initial state

Pause_requested:  the UA has been asked to pause playback

Play_requested:  the UA has been asked to play

(This could probably be a single boolean if we 
don't need the empty initial state).



Actual playing is reflected by the is_playing 
property and the Rate_changed event.

Rate_changed gets dispatched if either of
a) is_playing changes value (between true and false)
or
b) is_playing is true and the current playback rate changes

Specifically:
a) for a streaming protocol, after a 
play_request, the network connection is opened, 
data is requested, some amount of de-jitter 
buffer accumulated, and then is_playing changes 
to true and a Rate_changed event happens
b) for a download or streaming protocol, if the 
buffer runs dry while playing, is_playing changes 
to false and the Rate_changed event is dispatched.




So, some examples.

Download, initially:
Network state changes from empty to loading
If auto_play was requested, the play_request state enters Play_requested
Some time later, Media state changes to 
metadata_loaded, and then Can_display, and then 
Can_play
As it enters Can_play or Can_play_through (UA 
discretion), if Auto_play was requested, 
is_playing changes to true and a Rate_changed 
event happens

RTSP/RTP Streaming, initially:
If auto_play was requested, the play_request state enters Play_requested
Network state goes briefly to loading as the 
server is contacted and the media setup
Media state then goes straight to Can_play or even Can_play_through
If auto_play was requested, the UA accumulates a 
de-jitter buffer and then is_playing changes to 
true and a Rate_changed event dispatched

On a download stall, where there is no data ahead 
of current_time, is_playing spontaneously drops 
to false and rate_changed is dispatched, and the 
media_state drops from Can_play to Can_display. 
If data arrives and the UA thinks it a good idea, 
then while play_requested remains true, it can 
re-start playback, setting is_playing to true and 
dispatching rate_changed.

And so on.

-- 
David Singer
Apple/QuickTime

Received on Friday, 19 September 2008 00:41:11 UTC