Mozilla/Cisco API Proposal Revisions

Thanks to everyone for their valued feedback on the proposal! I've just 
finished gathering all the responses so far and we've made corresponding 
revisions to our proposal. This email is a summary of the various 
comments that have been made so far, as well as a description of the 
proposal in its current state (and why it differs from the WhatWG draft 
as of this morning)

I may have interpreted your comments incorrectly, and if that is the 
case, please correct me! If I inadvertently missed something (it was a 
bit difficult to sift through all the sub-threads, so it is quite 
possible) do point it out. I apologize for the wall-of-text, but I 
cannot figure out a better way to represent our current thinking. If you 
are less interested in the summary and more in the details of our 
current proposal, please skim through the sections marked *ACTION*. Not 
that the "action"s are merely what we propose at this stage, and we 
would really appreciate your feedback on this next iteration.


1. Obtaining media streams from the user:

	1a) Ralph suggests not exposing the option of picking a particular 
camera to the webapp, and that the task of be done by the User (and/or 
the User-Agent) via an appropriate UI. Ian mentions that allowing the 
webapp to specify which camera it needs input from is one of the most 
requested features for WebRTC. Cary agrees that there needs to be way to 
smoothly switch between different cameras from within an app itself.

	*ACTION*: Represent each camera as a separate MediaStreamTrack. User 
permission is asked only once, and if granted, an app is able to freely 
switch between multiple cameras by simply enabling/disabling the tracks 
as appropriate. The exact way in which the app determines the function 
of each camera (is it front-facing?) via the track object is TBD 
(suggestions are welcome).

	
	1b) Anant proposed that getUserMedia be renamed to getMediaStream, Ian 
suggests that it does not actually make things clearer. Both of them 
agree that naming is less of an issue than the functionality.
	
	*ACTION*: Keep getUserMedia() and discard getMediaStream(). 
getUserMediaStream() is too long. The function takes three arguments: 
the options, a success callback and an error callback.

	
	1c) The exact data format for options could either be an object 
dictionary or a string. Anant suggested that the options be specified as 
an object. Wu points out that the data model and representation are two 
different things and be treated separately. Harald agrees and recommends 
against coming up with a new string format (for which a new parser would 
have to be built) and suggest adopting JSON (which can be stringified if 
necessary). Cullen and Anant both agree that a JSON object would be better.
	
	*ACTION*: Make the first argument to getUserMedia a JSON object. 
Options that will definitely be present are "audio" and "video" boolean 
properties. All remaining options are up for debate (and ties into the 
hints discussion that follows).


2. Media Streams

	2a) Anant proposed that there be a way for the webapp to provide hints 
as to what kind of video or audio it wants a particular MediaStream to . 
Ian disagrees and thinks it is best for the UA to decide what streams to 
provide, and hints, if any must always be declarative that allows us to 
improve things in the future without breaking existing code. Tim argues 
that hints like "is this audio spoken voice or music?" are high level 
declarations, and Ralph agrees that there should be some extensible 
mechanism by which such information is passed on to the user agent. 
Stefan disagrees and notes that it is perhaps unreasonable to enumerate 
all the possible high-level declarations, and that any effort to do so 
may quickly be outdated by future developments; and thus should be 
omitted entirely. He also adds that the correct way to handle this 
should be add use cases.

	2b) Harald proposes that there are multiple scenarios in which the 
browser needs to make a choice, and that the webapp might want to 
influence those decisions. He further proposes that a "hints" JSON 
object be passed as an argument to all methods that would change a 
MediaStream in some way, and that the properties of this object remain 
unspecified for the first iteration. Cullen argues that there are 
already several clear use-cases where such hints can be concretely 
defined, but agrees that the mechanism by which these hints are 
described be extensible.

	*ACTION*: Come up with a set of concrete use-cases in which it is 
necessary (and useful) for the webapp to provide hints to the user 
agent, and then decide what these hints should actually be (if the 
use-cases chosen are ones that we have collectively agreed to tackle 
with the first version of the API).


	2c) The initial proposal suggested that each MediaStreamTrack have its 
own 'type' attribute that included codec level information on what the 
track represented. Ian disagreed and wants web applications to be 
unaware of such details, and thus a single 'kind' attribute which could 
be "audio" or "video" suffices. Anant suggests that there might be use 
cases where the webapp may want to know codec-level information because 
the platform may not support it and it wants to attempt content-level 
decoding and/or DRM. If that is the case, then we would also need a way 
to notify the webapp when the type underlying a MediaStream changes.

	*ACTION*: Rename the 'type' attribute to 'kind', but instead of the 
only values being "video"/"audio", include more information such as 
codec type. Rename onTypeChanged to onKindChanged, and create a new 
track every time a change occurs.


	2d) The MediaStream object in particular is one that is likely to be 
shared between multiple working groups, and many people expressed 
concern over the possibility of incompatibility between these APIs. Ian 
suggests that we all work with the WhatWG on the specification :)

	*ACTION*: Find a way to co-ordinate between all interested parties to 
come up with a single MediaStream definition. The implementation phase 
for each working group is likely to tie them all together anyway 
(Firefox, for example, will have only one MediaStream object; which is 
hopefully a union of all the interfaces defined in each WG). Robert 
agrees that it would be bad if the APIs are inconsistent, and he is 
already working on a MediaStream implementation for the Audio WG.


	2e) The original proposal suggested that DTMF be represented as an 
independent MediaStreamTrack. Ian pointed out that the WhatWG has 
existing specifications for VideoTrack, AudioTrack and TextTrack objects 
and that we should inter-operate. Niklas noted that SIP has a variety of 
ways of doing DTMF, and that we should stick to only one.

	*ACTION*: Make MediaStreamTrack interoperable with the corresponding 
track specs from the WHATWG, but add a new DTMFTrack type (subclassed 
from MediaStreamTrack) that would represent the DTMF signal.


3. PeerConnection

	3a) Cullen asked for clarification on how the WhatWG model could work 
without an explicit connect/listen method, and Ralph responded by a 
description of JS's run-to-completion model. Ian recommends that we let 
the PeerConnection constructor decide if it is listening or connecting 
and that we rely on the JS event loop to queue callbacks that will be 
invoked after the script ends. Cullen points out that it is also 
inefficient if the client adds multiple streams, since if the connect is 
implicit it is possible that not all the streams parameters were 
negotiated in the initial exchange.

	*ACTION*: We debated this at Mozilla for a fair bit, and there is no 
consensus yet. Brendan acknowledged that the run-to-completion model is 
not going anywhere, but if we're concerned about compatibility with 
existing web APIs, pointed out that XHR has an explicit open() method 
too. Given this, I would highly recommend exposing an explicit connect() 
method. Ian, do you have specific examples for why this approach is more 
error-prone?


	3b) The original proposal contained two versions of the PeerConnection 
API, one in which it represented a single 1:1 connection; and another in 
which we added a new PeerListener object that would serve as a factory 
to generate new PeerConnection objects as necessary. Ian expressed 
interest in support the 1:many broadcast use-case but noted that ICE 
does not have supported mode to do it. Cullen confirmed that ICE does 
not allow a multicast between several parties, but individual 
PeerConnections are formed by large groups all the time.

	*ACTION*: We want to support the 1:many use-case, and even though the 
webapp could technically do it on its own by creating multiple 
PeerConnection objects; we propose providing an abstraction over it 
(such as the PeerConnection factory) that makes this easy. It has been 
established that ICE inherently cannot support more than 2 endpoints, 
but this is not necessary to enable multi-user media streams.


	3d) Matthew gave us an overview of what it meant to model 
PeerConnections after either flows or sessions. Notably, he points out 
that there is value in reusing session setup, thus, opening a new 
session should have the same API as that of opening a flow that reuses 
an existing session.

	*ACTION*: We have not yet described in detail what actually happens 
when a PeerConnection object is constructed and what happens when 
connect() is called [if we decide to adopt that API], but when we do, 
Mathew's suggestions are certainly useful. Does all of this get 
specified as part of the W3C-WG or the IETF?


That's all I have for now. Fire away :)

Best Regards,
-Anant

Received on Tuesday, 19 July 2011 21:35:35 UTC