RE: Mozilla/Cisco API Proposal from Chou, Wu (Wu) on 2011-07-12 (public-webrtc@w3.org from July 2011)

From: Chou, Wu (Wu) <wuchou@avaya.com>
Date: Tue, 12 Jul 2011 15:23:43 -0400
To: Ian Hickson <ian@hixie.ch>, Anant Narayanan <anant@mozilla.com>
CC: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <47AB18AC0F23934383F2BBA7EE3D8D7422420D61AF@DC-US1MBEX4.global.avaya.com>
The discussion on using string or JS object is helpful. However, it should be done after we have established a commonly agreed data model for RTC web, or for a particular function, e.g. media description.

The data model should define elements, relations, semantics, and extensibility using some formal language, e.g. UML, XML etc., that covers all use cases.

A data model specification, e.g. UML, XML, etc., allows developers to verify their implementations agnostically. In addition, the data model can be imported or modified from other established standards or communication protocols.

After the establishment of the data model, we can be at a position to look into the binding issue to HTML, e.g. either as object or using string. Moreover, we can evaluate the pros and cons of each approach in a concrete setting, as these are purely binding issue after the establishment of the data model.

Without an agreed data model, things are entangled, e.g. ICE, SDP, state, extensibility, etc., many of them belong to data model not on binding and sterilization to JS.

Thanks,

- Wu Chou/Li Li

Avaya Labs Research, Avaya Inc.


-----Original Message-----
From: public-webrtc-request@w3.org [mailto:public-webrtc-request@w3.org] On Behalf Of Ian Hickson
Sent: Monday, July 11, 2011 6:09 PM
To: Anant Narayanan
Cc: public-webrtc@w3.org
Subject: Re: Mozilla/Cisco API Proposal

On Mon, 11 Jul 2011, Anant Narayanan wrote:
>
> 1. getUserMedia():
>       - We renamed it to getMediaStream, simply to make it more clear that
> a successful invocation will return a MediaStream object.

getUserMedia() was named to emphasise that it got the user's media. After all, you can get a MediaStream in a variety of other ways, e.g. from a remote peer.


>       - The first argument was made into a JS object rather than a string,
> to allow for the application to specify more details as to what kind
> of inputs it would like. We've been using this model in our prototype
> (Rainbow) as well as other Mozilla project that expose APIs via
> navigator [1] and so far developers seem to like it.

I considered doing that, but it seems like the common case just gets quite a bit more complicated:

   navigator.getUserMedia('audio,video');

....vs:

   navigator.getUserMedia({"audio":{},"video":{}});

(I think... I couldn't quite work out what the shortest possible way of getting an audio and video was with the object form in your proposal.)


> Keeping it an object also allows to extend functionality in the future
> without breaking the function signature.

The string is extensible as well. :-)


One of the differences is that your proposal allows the author to set
things like the quality of the audio. It's not clear to me what the use
case is for that. Can you elaborate on that? It seems like you'd just want
the MediaStream to represent the best possible quality and just let the UA
downsample as needed. (When recording it might make sense to specify a
format; I haven't done anything on that front because I've no idea what
formats are going to be common to all implementations.)


>       - We made the success and error callback into pure functions for
> simplicity, as opposed to events.

They are callbacks, not events, in the WHATWG spec as well. The WebIDL
magic has to be done the way the WHATWG spec does it rather than using
Function, so that you can define the arguments to the callback.


> 2. MediaStream:
>       - We don't think it is necessary to differentiate between local and
> remote media streams, so we do not define a LocalMediaStream object.
> The stop() method for the local is emulated by setting the stream's
> readyState to ENDED (if this is done on a remote stream, it means that
> peer is no longer interested in that data).

Interesting. Basically you're saying you would like a way for a peer to
start an SDP offer/answer exchange where one of the streams offered by the
other peer has been zeroed out? (Currently there's no way for a peer to
tell the other peer to stop sending something.)

How should this be notified on the other side?

Should it be possible for the other side to just restart sending the
stream?


>       - Inputs (sources) and Outputs (sinks) are implied and thus not
> exposed. Assigning a stream to another object (or variable) implies adding a
> sink.

Not sure what you mean here.


>       - We added a BLOCKED state in addition to LIVE and ENDED, to allow a
> peer to say "I do not want data from this stream right now, but I may later" -
> eg. call hold.

How is this distinguished from the stop() scenario at the SDP level?


The reason I didn't expose a way for a peer to tell another peer to stop
sending media (temporarily or permanently) is that I figured authors would
just implement that using their own signalling channel. Instead of A send
media to B, then B use ICE/SDP to stop the media, then B use ICE/SDP to
resume the media, you would have A send media to B, then B tell A via the
author's signalling channel to stop sending the media, then A would stop
sending the media to B, and later A would resume in a similar manner.


> 3. StreamTrack:
>       - Renamed to MediaStreamTrack to signify that it is hierarchically a
> part of MediaStream.

I mostly didn't do that because "MediaStreamTrackList" is a long interface
name. It also allows us to reuse StreamTrack for Stream later, if roc gets
his way. :-)


>       -  Added a MediaStreamStackHints object to allow JS content to specify
> details about the media it wants to transport, this is to allow the platform
> (user-agent) to select an appropriate codec.
>       - Sometimes, the ideal codec cannot be found until after the RTP
> connection is established, so we added an onTypeChanged event. A new
> MediaStreamTrack will be created with the new codec (if it was changed).

It's not clear to me why the author would ever care about this. What are
the use cases here?


>       - StreamTrack.kind was renamed to MediaStreamTrack.type and takes a
> IANA media string to allow for more flexibility and to allow specifying a
> codec.

This makes StreamTrack inconsistent with VideoTrack, AudioTrack, and
TextTrack, which I think we should avoid.

The reason I didn't expose a media type here is that the media need not be
in a type at all when represented by a MediaStream. It could be something
the OS media platform is handling that the browser never sees (e.g. if the
stream is just plugged straight into something that plays stuff on the
speakers). We shouldn't require that the UA expose a type, IMHO.

There are cases where it makes sense to expose a type, e.g. recording to
disk, of course. For those we should expose this. But that's distinct from
the use case of "kind", which is just about letting the author distinguish
the audio tracks from the video tracks.

Also, supportedTypes should probably not be exposed as an array (UAs can't
always enumerate all the supported types) and should probably not be on
MediaStreamTrack, since presumably it's consistent across the lifetime of
the object.


>       - MediaStreamTracks can be ENABLED or DISABLED. Eg. someone on a video
> call may choose to DISABLE video and only do audio without interruption due to
> bandwidth constraints.

StreamTrack has this, it's the 'enabled' boolean.


MediaStreamTrack also adds volume gain control; shouldn't this be done
using the audio API (whatever that becomes)?



> 4. PeerConnection:
>       - Renamed signalingCallback -> sendSignal, signalingMessage ->
> receivedSignal, addStream -> addLocalStream, removeStream ->
> removeLocalStream, onaddstream -> onRemoteStreamAdded, onremovestream ->
> onRemoteStreamRemoved for additional clarity.

I have to say I don't see that the new names are any clearer. :-)

A "signal" is generally a Unix thing, quite different from the signalling
channel. ICE calls this the "signalling channel", which is why I think we
should use this term.

Also, the event handler attributes should be all lowercase for consistency
with the rest of the platform.

I'm not sure the "addLocalStream()" change is better either; after all,
it's quite possible to add a remote stream, e.g. to send the stream onto
yet another user. Maybe addSendingStream() or just sendStream() and
stopSendingStream() would be clearer?

(I should clarify that I have no strong opinion on any of these naming
issues. If anyone feels strongly about one name or another, I'm happy to
change the spec to match whatever is preferred.)


>       - We added a LISTENING state (this means the PeerConnection can call
> accept() to open an incoming connection, the state is entered into by calling
> listen()), and added an open() call to allow a peer to explicitly "dial out".

Generally speaking this is an antipattern, IMHO. We learnt with
XMLHttpRequest that this kind of design leads to a rather confusing
situation where you have to support many more state transitions, and it
leads to very confusing bugs. This is why I designed the PeerConnection()
object to have a constructor and to automatically determine if it was
sending or receiving (and gracefully handle the situation where both
happen at once). It makes the authoring experience much easier.


>       - This part is tricky because we want to support 1:many. There are a
> few concerns around if ICE even supports it and if it does how it works. If it
> were possible, API wise, we also put up an alternate proposal that adds a
> PeerListener object (this API looks very much like UNIX sockets):

I'd love to support 1:many; the main reason I didn't is lack of support in
ICE. If there's an extension to ICE to support this I'd be more than happy
to add it to PeerConnection.


> 5. MediaBuffer:
>       - We added this completely new object to try & think about how it to
> make it possible to write decoders purely in JS (such as mp3.js).
>       - This is possibly the most ambiguous part of the spec as we are not
> sure if this is worth doing or even possible without getting into codec
> specifics.
>       - Tim and Robert have many valid concerns in trying to do this, Cullen
> has some ideas on how we can make it work.

Sounds like this should be part of the audio API discussion.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 12 July 2011 19:30:06 UTC