Re: Mozilla/Cisco API Proposal from Ian Hickson on 2011-07-11 (public-webrtc@w3.org from July 2011)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 11 Jul 2011 22:09:21 +0000 (UTC)
To: Anant Narayanan <anant@mozilla.com>
cc: public-webrtc@w3.org
Message-ID: <Pine.LNX.4.64.1107112138140.2079@ps20323.dreamhostps.com>
On Mon, 11 Jul 2011, Anant Narayanan wrote:
> 
> 1. getUserMedia():
> 	- We renamed it to getMediaStream, simply to make it more clear that a
> successful invocation will return a MediaStream object.

getUserMedia() was named to emphasise that it got the user's media. After 
all, you can get a MediaStream in a variety of other ways, e.g. from a 
remote peer.


> 	- The first argument was made into a JS object rather than a 
> string, to allow for the application to specify more details as to what 
> kind of inputs it would like. We've been using this model in our 
> prototype (Rainbow) as well as other Mozilla project that expose APIs 
> via navigator [1] and so far developers seem to like it.

I considered doing that, but it seems like the common case just gets quite 
a bit more complicated:

   navigator.getUserMedia('audio,video');

...vs:

   navigator.getUserMedia({"audio":{},"video":{}});

(I think... I couldn't quite work out what the shortest possible way of 
getting an audio and video was with the object form in your proposal.)


> Keeping it an object also allows to extend functionality in the future 
> without breaking the function signature.

The string is extensible as well. :-)


One of the differences is that your proposal allows the author to set 
things like the quality of the audio. It's not clear to me what the use 
case is for that. Can you elaborate on that? It seems like you'd just want 
the MediaStream to represent the best possible quality and just let the UA 
downsample as needed. (When recording it might make sense to specify a 
format; I haven't done anything on that front because I've no idea what 
formats are going to be common to all implementations.)


> 	- We made the success and error callback into pure functions for 
> simplicity, as opposed to events.

They are callbacks, not events, in the WHATWG spec as well. The WebIDL 
magic has to be done the way the WHATWG spec does it rather than using 
Function, so that you can define the arguments to the callback.


> 2. MediaStream:
> 	- We don't think it is necessary to differentiate between local and
> remote media streams, so we do not define a LocalMediaStream object.
> The stop() method for the local is emulated by setting the stream's 
> readyState to ENDED (if this is done on a remote stream, it means that 
> peer is no longer interested in that data).

Interesting. Basically you're saying you would like a way for a peer to 
start an SDP offer/answer exchange where one of the streams offered by the 
other peer has been zeroed out? (Currently there's no way for a peer to 
tell the other peer to stop sending something.)

How should this be notified on the other side?

Should it be possible for the other side to just restart sending the 
stream?


> 	- Inputs (sources) and Outputs (sinks) are implied and thus not
> exposed. Assigning a stream to another object (or variable) implies adding a
> sink.

Not sure what you mean here.


> 	- We added a BLOCKED state in addition to LIVE and ENDED, to allow a
> peer to say "I do not want data from this stream right now, but I may later" -
> eg. call hold.

How is this distinguished from the stop() scenario at the SDP level?


The reason I didn't expose a way for a peer to tell another peer to stop 
sending media (temporarily or permanently) is that I figured authors would 
just implement that using their own signalling channel. Instead of A send 
media to B, then B use ICE/SDP to stop the media, then B use ICE/SDP to 
resume the media, you would have A send media to B, then B tell A via the 
author's signalling channel to stop sending the media, then A would stop 
sending the media to B, and later A would resume in a similar manner.


> 3. StreamTrack:
> 	- Renamed to MediaStreamTrack to signify that it is hierarchically a
> part of MediaStream.

I mostly didn't do that because "MediaStreamTrackList" is a long interface 
name. It also allows us to reuse StreamTrack for Stream later, if roc gets 
his way. :-)


> 	-  Added a MediaStreamStackHints object to allow JS content to specify
> details about the media it wants to transport, this is to allow the platform
> (user-agent) to select an appropriate codec.
> 	- Sometimes, the ideal codec cannot be found until after the RTP
> connection is established, so we added an onTypeChanged event. A new
> MediaStreamTrack will be created with the new codec (if it was changed).

It's not clear to me why the author would ever care about this. What are 
the use cases here?


> 	- StreamTrack.kind was renamed to MediaStreamTrack.type and takes a
> IANA media string to allow for more flexibility and to allow specifying a
> codec.

This makes StreamTrack inconsistent with VideoTrack, AudioTrack, and 
TextTrack, which I think we should avoid.

The reason I didn't expose a media type here is that the media need not be 
in a type at all when represented by a MediaStream. It could be something 
the OS media platform is handling that the browser never sees (e.g. if the 
stream is just plugged straight into something that plays stuff on the 
speakers). We shouldn't require that the UA expose a type, IMHO.

There are cases where it makes sense to expose a type, e.g. recording to 
disk, of course. For those we should expose this. But that's distinct from 
the use case of "kind", which is just about letting the author distinguish 
the audio tracks from the video tracks.

Also, supportedTypes should probably not be exposed as an array (UAs can't 
always enumerate all the supported types) and should probably not be on 
MediaStreamTrack, since presumably it's consistent across the lifetime of 
the object.


> 	- MediaStreamTracks can be ENABLED or DISABLED. Eg. someone on a video
> call may choose to DISABLE video and only do audio without interruption due to
> bandwidth constraints.

StreamTrack has this, it's the 'enabled' boolean.


MediaStreamTrack also adds volume gain control; shouldn't this be done 
using the audio API (whatever that becomes)?



> 4. PeerConnection:
> 	- Renamed signalingCallback -> sendSignal, signalingMessage ->
> receivedSignal, addStream -> addLocalStream, removeStream ->
> removeLocalStream, onaddstream -> onRemoteStreamAdded, onremovestream ->
> onRemoteStreamRemoved for additional clarity.

I have to say I don't see that the new names are any clearer. :-)

A "signal" is generally a Unix thing, quite different from the signalling 
channel. ICE calls this the "signalling channel", which is why I think we 
should use this term.

Also, the event handler attributes should be all lowercase for consistency 
with the rest of the platform.

I'm not sure the "addLocalStream()" change is better either; after all, 
it's quite possible to add a remote stream, e.g. to send the stream onto 
yet another user. Maybe addSendingStream() or just sendStream() and 
stopSendingStream() would be clearer?

(I should clarify that I have no strong opinion on any of these naming 
issues. If anyone feels strongly about one name or another, I'm happy to 
change the spec to match whatever is preferred.)


> 	- We added a LISTENING state (this means the PeerConnection can call
> accept() to open an incoming connection, the state is entered into by calling
> listen()), and added an open() call to allow a peer to explicitly "dial out".

Generally speaking this is an antipattern, IMHO. We learnt with 
XMLHttpRequest that this kind of design leads to a rather confusing 
situation where you have to support many more state transitions, and it 
leads to very confusing bugs. This is why I designed the PeerConnection() 
object to have a constructor and to automatically determine if it was 
sending or receiving (and gracefully handle the situation where both 
happen at once). It makes the authoring experience much easier.


> 	- This part is tricky because we want to support 1:many. There are a
> few concerns around if ICE even supports it and if it does how it works. If it
> were possible, API wise, we also put up an alternate proposal that adds a
> PeerListener object (this API looks very much like UNIX sockets):

I'd love to support 1:many; the main reason I didn't is lack of support in 
ICE. If there's an extension to ICE to support this I'd be more than happy 
to add it to PeerConnection.


> 5. MediaBuffer:
> 	- We added this completely new object to try & think about how it to
> make it possible to write decoders purely in JS (such as mp3.js).
> 	- This is possibly the most ambiguous part of the spec as we are not
> sure if this is worth doing or even possible without getting into codec
> specifics.
> 	- Tim and Robert have many valid concerns in trying to do this, Cullen
> has some ideas on how we can make it work.

Sounds like this should be part of the audio API discussion.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 11 July 2011 22:09:44 UTC