Re: Mozilla/Cisco API Proposal from Anant Narayanan on 2011-07-11 (public-webrtc@w3.org from July 2011)

From: Anant Narayanan <anant@mozilla.com>
Date: Mon, 11 Jul 2011 16:18:25 -0700
To: Ian Hickson <ian@hixie.ch>
CC: public-webrtc@w3.org
Message-ID: <4E1B84C1.3010604@mozilla.com>
On 7/11/11 3:09 PM, Ian Hickson wrote:
> getUserMedia() was named to emphasise that it got the user's media. After
> all, you can get a MediaStream in a variety of other ways, e.g. from a
> remote peer.

That seems fair. getUserMediaStream is perhaps too long. I'm less 
worried about the actual names though.

> I considered doing that, but it seems like the common case just gets quite
> a bit more complicated:
>
>     navigator.getUserMedia('audio,video');
>
> ...vs:
>
>     navigator.getUserMedia({"audio":{},"video":{}});

I was hoping that the call would get audio & video by default. So, with 
callbacks:

navigator.getMediaStream({}, onsuccess, onerror);

> The string is extensible as well. :-)

True, but not in the same way JS objects are. In particular, we'll have 
to come up with an entirely new string structure, and I'm hoping to 
avoid that!

> One of the differences is that your proposal allows the author to set
> things like the quality of the audio. It's not clear to me what the use
> case is for that. Can you elaborate on that? It seems like you'd just want
> the MediaStream to represent the best possible quality and just let the UA
> downsample as needed. (When recording it might make sense to specify a
> format; I haven't done anything on that front because I've no idea what
> formats are going to be common to all implementations.)

We should at-least allow those parameters that can be specified across 
multiple implementations. For example, quality is a number between 0.0 
and 1.0, and that might mean different things for different codecs, and 
that's okay. Using a JS object also means that UA's can simply ignore 
properties they don't understand. The options object is intended to be a 
request anyway, and not all requests may be fulfilled. The 
MediaStreamTrack that is returned is the authoritative source of what 
the webapp author actually got.

Local recording was definitely on my mind when I wrote the options 
object. Cullen also suggested that sometimes the webapp/user might want 
to choose a particular pixel aspect ratio.

The webapp author may choose to specify nothing, in which case we 
automatically give out the best quality and the best framerate. 
Resolution is trickier, since we won't know what sinks the mediastream 
will be attached to, but some sane defaults can be made to work.

> They are callbacks, not events, in the WHATWG spec as well. The WebIDL
> magic has to be done the way the WHATWG spec does it rather than using
> Function, so that you can define the arguments to the callback.

Sounds good.

> Interesting. Basically you're saying you would like a way for a peer to
> start an SDP offer/answer exchange where one of the streams offered by the
> other peer has been zeroed out? (Currently there's no way for a peer to
> tell the other peer to stop sending something.)

Yes.

> How should this be notified on the other side?

I believe it is possible in RTP to ask the other side to stop sending 
something. If not, we could always just send our own UDP message.

> Should it be possible for the other side to just restart sending the
> stream?

I don't think so. If a peer explicitly set the readyState of a remote 
stream to BLOCKED it means they don't want data. The other side could of 
course, send a completely new stream if it wishes to.

>> 	- Inputs (sources) and Outputs (sinks) are implied and thus not
>> exposed. Assigning a stream to another object (or variable) implies adding a
>> sink.
>
> Not sure what you mean here.

document.getElementById("somevideoelement").stream = myMediaStream;

sets the video element to be an output for myMediaStream. The 
MediaStream does not have an interface to find out all its inputs and 
outputs. I don't think this part differs much from your proposal, I 
mentioned it because it came up earlier :)

>> 	- We added a BLOCKED state in addition to LIVE and ENDED, to allow a
>> peer to say "I do not want data from this stream right now, but I may later" -
>> eg. call hold.
>
> How is this distinguished from the stop() scenario at the SDP level?

stop() at SDP is initiated when a stream is ENDED, as mentioned before 
we'll have to come up with a new mechanism (or use an existing RTP 
mechanism) to implement BLOCKED.

> The reason I didn't expose a way for a peer to tell another peer to stop
> sending media (temporarily or permanently) is that I figured authors would
> just implement that using their own signalling channel. Instead of A send
> media to B, then B use ICE/SDP to stop the media, then B use ICE/SDP to
> resume the media, you would have A send media to B, then B tell A via the
> author's signalling channel to stop sending the media, then A would stop
> sending the media to B, and later A would resume in a similar manner.

That's certainly another way to do it. If B wants to temporarily stop a 
stream from A, it could tell A out of band and A could set it's local 
stream to state BLOCKED. Either case, we'd have to implement the BLOCKED 
state to support this.

> I mostly didn't do that because "MediaStreamTrackList" is a long interface
> name. It also allows us to reuse StreamTrack for Stream later, if roc gets
> his way. :-)

Why a separate List object and not simply an array of Tracks?

>> 	-  Added a MediaStreamStackHints object to allow JS content to specify
>> details about the media it wants to transport, this is to allow the platform
>> (user-agent) to select an appropriate codec.
>> 	- Sometimes, the ideal codec cannot be found until after the RTP
>> connection is established, so we added an onTypeChanged event. A new
>> MediaStreamTrack will be created with the new codec (if it was changed).
>
> It's not clear to me why the author would ever care about this. What are
> the use cases here?

The author *needn't* care about it (simply don't provide the hints) but 
can if they want to. Sometimes you're transmitting fast moving images, 
other times you're transmitting a slideshow (where you want each slide 
to be of high quality, but very low frame rate). Only the application 
can know this, and it'd be good for the platform to optimize for it.

>> 	- StreamTrack.kind was renamed to MediaStreamTrack.type and takes a
>> IANA media string to allow for more flexibility and to allow specifying a
>> codec.
>
> This makes StreamTrack inconsistent with VideoTrack, AudioTrack, and
> TextTrack, which I think we should avoid.

We were proposing that the types be from the list here:
http://www.iana.org/assignments/media-types/index.html

It certainly includes types for text (subtitles), audio as well as 
video. Tracks are just data of a certain type, so we don't have separate 
objects for each kind.

That being said, if there's already a spec that we should inter-operate 
with; that's reasonable. Where can I find more info on VideoTrack, 
AudioTrack and TextTrack? Have these been implemented by any UA's?

> The reason I didn't expose a media type here is that the media need not be
> in a type at all when represented by a MediaStream. It could be something
> the OS media platform is handling that the browser never sees (e.g. if the
> stream is just plugged straight into something that plays stuff on the
> speakers). We shouldn't require that the UA expose a type, IMHO.

Sure, you can plug the stream you get from navigator into a <video or 
<canvas> and not worry about the types. But they're there if you're 
building a sophisticated webapp that needs to know.

> Also, supportedTypes should probably not be exposed as an array (UAs can't
> always enumerate all the supported types) and should probably not be on
> MediaStreamTrack, since presumably it's consistent across the lifetime of
> the object.

Makes sense, we could move this to a global object or the stream.

> MediaStreamTrack also adds volume gain control; shouldn't this be done
> using the audio API (whatever that becomes)?

I agree. Robert warned me that this could a point of confusion :)

>> 4. PeerConnection:
>> 	- Renamed signalingCallback ->  sendSignal, signalingMessage ->
>> receivedSignal, addStream ->  addLocalStream, removeStream ->
>> removeLocalStream, onaddstream ->  onRemoteStreamAdded, onremovestream ->
>> onRemoteStreamRemoved for additional clarity.
>
> I have to say I don't see that the new names are any clearer. :-)
>
> A "signal" is generally a Unix thing, quite different from the signalling
> channel. ICE calls this the "signalling channel", which is why I think we
> should use this term.

Fair enough. How about sendSignalingMessage() and 
receivedSignalingMessage()? Perhaps too long :-)

> Also, the event handler attributes should be all lowercase for consistency
> with the rest of the platform.
>
> I'm not sure the "addLocalStream()" change is better either; after all,
> it's quite possible to add a remote stream, e.g. to send the stream onto
> yet another user. Maybe addSendingStream() or just sendStream() and
> stopSendingStream() would be clearer?

Ah, adding a remote stream to pass it onto another peer, I had not 
considered. Mainly the renaming was done to clarify that when you add a 
local stream with addStream, the streamAdded callback would not be 
invoked (since that happens only when remote streams are added).

>> 	- We added a LISTENING state (this means the PeerConnection can call
>> accept() to open an incoming connection, the state is entered into by calling
>> listen()), and added an open() call to allow a peer to explicitly "dial out".
>
> Generally speaking this is an antipattern, IMHO. We learnt with
> XMLHttpRequest that this kind of design leads to a rather confusing
> situation where you have to support many more state transitions, and it
> leads to very confusing bugs. This is why I designed the PeerConnection()
> object to have a constructor and to automatically determine if it was
> sending or receiving (and gracefully handle the situation where both
> happen at once). It makes the authoring experience much easier.

I'm all for simplifying the API as much as possible, if there's a way 
for us to fulfill all the use cases we have in mind. Without an explicit 
LISTENING state, how would you handle receiving a call on another 
browser tab, while you are currently in a call? I would certainly like 
for the user to be able to put this one on hold and switch to the other one.

We recognize that there are complex UI issues at play here too. For v1 
it might be okay to not think about some of these fancy use cases. We 
could come up with a nice, general API now that covers all of them, or 
fast track to a simple API that fits only a subset of the use-cases with 
the expectation that the API might change (significantly) in the future 
as we tackle some of the more tricky use-cases. I'm agnostic about which 
approach we take.

>> 	- This part is tricky because we want to support 1:many. There are a
>> few concerns around if ICE even supports it and if it does how it works. If it
>> were possible, API wise, we also put up an alternate proposal that adds a
>> PeerListener object (this API looks very much like UNIX sockets):
>
> I'd love to support 1:many; the main reason I didn't is lack of support in
> ICE. If there's an extension to ICE to support this I'd be more than happy
> to add it to PeerConnection.

Cullen & team, do you have any thoughts on this?

>> 5. MediaBuffer:
...
> Sounds like this should be part of the audio API discussion.

+1.

In general, I think the common theme for your comments is to make things 
as easy for the web developer as possible. I agree, in general, but for 
an API at this level we should go for maximum flexibility that gives as 
much power to the web application as we possibly can.

Programming for it may not be a cake-walk, but that's OK (Network 
programming *is* hard!). However, consider that today, practically 
nobody uses the DOM API directly, everyone is building webapps with 
jQuery or some other fancy JS toolkit. CSS is hard to understand and 
write, that's why we have things like http://lesscss.org/. With sane 
defaults and a cross-browser toolkit that sits on top of whatever APIs 
user agents end up implementing we can achieve elegance and simplicity 
for the common cases. Let's face it, we're not going to get rid of these 
cross-browser JS libraries because there's always bound to be (minor, at 
the least) differences in implementation of any spec.

But, in a couple years if we discover that we can't write this totally 
awesome MMORPG-in-the-browser that allows players to talk to each other 
while playing because of API limitations, well that would be not so 
awesome :-)

Thanks,
-Anant
Received on Monday, 11 July 2011 23:18:55 UTC