Re: State transitions for media elements from Philip Jägenstedt on 2008-09-22 (public-html@w3.org from September 2008)

From: Philip Jägenstedt <philipj@opera.com>
Date: Mon, 22 Sep 2008 17:52:19 +0200
To: Dave Singer <singer@apple.com>
Cc: public-html@w3.org, Ian Hickson <ian@hixie.ch>
Message-Id: <1222098739.24316.20.camel@localhost>
I agree with most of what Dave has written but have some additional
comments below.

On Thu, 2008-09-18 at 17:39 -0700, Dave Singer wrote:
> <http://www.w3.org/html/wg/html5/#media>
> 
> We've been discussing these states with other 
> implementors (including Philip Jägenstedt at 
> Opera, for example), and we think these really 
> could do with some refining.
> 
> In particular, we think that
> a) the state called 'network state' is actually a 
> mix of the state of the network and the state of 
> the media

Indeed. In particular I've noticed that if the file is available locally
(in cache or file://) one will know that it is fully loaded before
actually reading any data. But since network states must go through
LOADED_METADATA and LOADED_FIRST_FRAME (which require actually reading
data) reporting that state has to wait until later. It wouldn't be a big
issue implementation-wise, but is a symptom of the mixing of separate
issues into a single state.

> b) the media state needs to be described in terms 
> of what can be done with the media, not what it 
> has in hand, as what is in hand differs between 
> various protocols;
> c) the difference between whether play/pause is 
> *requested* and happening needs to be clearer:
>     -- in a download protocol, play can stall if data is unavailable
>     -- in a streaming protocol, there is an 
> initial buffering period in which play is 
> requested but time is not advancing
> 
> 
> So, here is a brief revision, obviously derived 
> from the existing document but, we hope, avoiding 
> these issues and supporting more (we hope, all) 
> protocols:
> 
> 
> Network state:  documents whether the network is 
> being used (e.g. for a network activity indicator)
> 
> Empty:  initial state, or state when there is a 
> failure that'll need some action to escape from
> 
> Idle:  the URI is known, but the UA has no need 
> to use the network right now (e.g. download 
> resource for which 'enough' is cached, streaming 
> resource which is not active...)
> 
> Loading: the network is being used right now (you 
> can show an activity indicator)
> 
> Loaded:  for a loadable resource, we've both 
> loaded it all and don't intend to unload it (you 
> could disconnect and walk away)
> 
> There is an event, Stalled, which is fired once 
> during Loading if data doesn't seem to be 
> arriving after a reasonable timeout (as now).

I'm thinking that the EMPTY state isn't needed at all, the lowest state
could be IDLE. Currently the EMPTY state is used in several algorithms
to detect a "fresh" media element, but as far as I can see an empty flag
would suffice, and it would not need to be exposed via the API at all.

> Media state:  documents what you can do with the 
> media.  (Each state is a superset of the one 
> preceding).
> 
> Empty: initial state
> 
> Metadata_loaded:  enough data has been loaded 
> that a well-defined set of questions can now be 
> answered as well as they ever could be (e.g. 
> duration, width/height, codecs used, and so on).
> 
> Can_display (or Can_display_at_current_time; 
> currently called can_display_current_frame):  the 
> UA has done all it can or intends to do for the 
> media resource to be displayable at the 
> current_time.  For a downloadable resource, this 
> means that the current video frame (if 
> applicable) can be painted, at least one sample 
> of audio (if applicable) played, and so on.  For 
> a streaming resource, it may mean very little 
> more than that if you are waiting for something 
> before you displayed the media element, stop 
> waiting:  it won't get any more displayable.
> 
> Can_play:  if playback were requested, the UA 
> expects it would be able to actually start within 
> a reasonable period and play a reasonable amount 
> (before a stall, for example).  For a 
> downloadable protocol, that means that at least 
> some data ahead of current_time is available; 
> for a streaming protocol, that if playback was 
> requested, playback would start 'soon'.
> 
> Can_play_through:  if playback was requested, the 
> UA is reasonably confident that it could play to 
> the end without a playback stall.  (This state 
> might never get entered if the network bandwidth 
> is insufficient and the resource cannot be 
> cached, either because of cache limitations or 
> because it's a streaming service)
> 
> 
> 
> Play_request state:  documents what has been 
> asked of the media.  We need state+events for 
> this because UAs can display a play/pause 
> controller that the scripts cannot 'see'.
> 
> Empty:  initial state
> 
> Pause_requested:  the UA has been asked to pause playback
> 
> Play_requested:  the UA has been asked to play
> 
> (This could probably be a single boolean if we 
> don't need the empty initial state).

I agree, the only EMPTY state that I think makes sense is that of media
state.

> Actual playing is reflected by the is_playing 
> property and the Rate_changed event.
> 
> Rate_changed gets dispatched if either of
> a) is_playing changes value (between true and false)
> or
> b) is_playing is true and the current playback rate changes
> 
> Specifically:
> a) for a streaming protocol, after a 
> play_request, the network connection is opened, 
> data is requested, some amount of de-jitter 
> buffer accumulated, and then is_playing changes 
> to true and a Rate_changed event happens
> b) for a download or streaming protocol, if the 
> buffer runs dry while playing, is_playing changes 
> to false and the Rate_changed event is dispatched.
> 
> 
> 
> 
> So, some examples.
> 
> Download, initially:
> Network state changes from empty to loading
> If auto_play was requested, the play_request state enters Play_requested
> Some time later, Media state changes to 
> metadata_loaded, and then Can_display, and then 
> Can_play
> As it enters Can_play or Can_play_through (UA 
> discretion), if Auto_play was requested, 
> is_playing changes to true and a Rate_changed 
> event happens
> 
> RTSP/RTP Streaming, initially:
> If auto_play was requested, the play_request state enters Play_requested
> Network state goes briefly to loading as the 
> server is contacted and the media setup
> Media state then goes straight to Can_play or even Can_play_through
> If auto_play was requested, the UA accumulates a 
> de-jitter buffer and then is_playing changes to 
> true and a Rate_changed event dispatched
> 
> On a download stall, where there is no data ahead 
> of current_time, is_playing spontaneously drops 
> to false and rate_changed is dispatched, and the 
> media_state drops from Can_play to Can_display. 
> If data arrives and the UA thinks it a good idea, 
> then while play_requested remains true, it can 
> re-start playback, setting is_playing to true and 
> dispatching rate_changed.
> 
> And so on.
> 
-- 
Philip Jägenstedt
Opera Software
Received on Monday, 22 September 2008 15:53:08 UTC