- From: Anant Narayanan <anant@mozilla.com>
- Date: Tue, 19 Jul 2011 14:35:06 -0700
- To: public-webrtc@w3.org
Thanks to everyone for their valued feedback on the proposal! I've just finished gathering all the responses so far and we've made corresponding revisions to our proposal. This email is a summary of the various comments that have been made so far, as well as a description of the proposal in its current state (and why it differs from the WhatWG draft as of this morning) I may have interpreted your comments incorrectly, and if that is the case, please correct me! If I inadvertently missed something (it was a bit difficult to sift through all the sub-threads, so it is quite possible) do point it out. I apologize for the wall-of-text, but I cannot figure out a better way to represent our current thinking. If you are less interested in the summary and more in the details of our current proposal, please skim through the sections marked *ACTION*. Not that the "action"s are merely what we propose at this stage, and we would really appreciate your feedback on this next iteration. 1. Obtaining media streams from the user: 1a) Ralph suggests not exposing the option of picking a particular camera to the webapp, and that the task of be done by the User (and/or the User-Agent) via an appropriate UI. Ian mentions that allowing the webapp to specify which camera it needs input from is one of the most requested features for WebRTC. Cary agrees that there needs to be way to smoothly switch between different cameras from within an app itself. *ACTION*: Represent each camera as a separate MediaStreamTrack. User permission is asked only once, and if granted, an app is able to freely switch between multiple cameras by simply enabling/disabling the tracks as appropriate. The exact way in which the app determines the function of each camera (is it front-facing?) via the track object is TBD (suggestions are welcome). 1b) Anant proposed that getUserMedia be renamed to getMediaStream, Ian suggests that it does not actually make things clearer. Both of them agree that naming is less of an issue than the functionality. *ACTION*: Keep getUserMedia() and discard getMediaStream(). getUserMediaStream() is too long. The function takes three arguments: the options, a success callback and an error callback. 1c) The exact data format for options could either be an object dictionary or a string. Anant suggested that the options be specified as an object. Wu points out that the data model and representation are two different things and be treated separately. Harald agrees and recommends against coming up with a new string format (for which a new parser would have to be built) and suggest adopting JSON (which can be stringified if necessary). Cullen and Anant both agree that a JSON object would be better. *ACTION*: Make the first argument to getUserMedia a JSON object. Options that will definitely be present are "audio" and "video" boolean properties. All remaining options are up for debate (and ties into the hints discussion that follows). 2. Media Streams 2a) Anant proposed that there be a way for the webapp to provide hints as to what kind of video or audio it wants a particular MediaStream to . Ian disagrees and thinks it is best for the UA to decide what streams to provide, and hints, if any must always be declarative that allows us to improve things in the future without breaking existing code. Tim argues that hints like "is this audio spoken voice or music?" are high level declarations, and Ralph agrees that there should be some extensible mechanism by which such information is passed on to the user agent. Stefan disagrees and notes that it is perhaps unreasonable to enumerate all the possible high-level declarations, and that any effort to do so may quickly be outdated by future developments; and thus should be omitted entirely. He also adds that the correct way to handle this should be add use cases. 2b) Harald proposes that there are multiple scenarios in which the browser needs to make a choice, and that the webapp might want to influence those decisions. He further proposes that a "hints" JSON object be passed as an argument to all methods that would change a MediaStream in some way, and that the properties of this object remain unspecified for the first iteration. Cullen argues that there are already several clear use-cases where such hints can be concretely defined, but agrees that the mechanism by which these hints are described be extensible. *ACTION*: Come up with a set of concrete use-cases in which it is necessary (and useful) for the webapp to provide hints to the user agent, and then decide what these hints should actually be (if the use-cases chosen are ones that we have collectively agreed to tackle with the first version of the API). 2c) The initial proposal suggested that each MediaStreamTrack have its own 'type' attribute that included codec level information on what the track represented. Ian disagreed and wants web applications to be unaware of such details, and thus a single 'kind' attribute which could be "audio" or "video" suffices. Anant suggests that there might be use cases where the webapp may want to know codec-level information because the platform may not support it and it wants to attempt content-level decoding and/or DRM. If that is the case, then we would also need a way to notify the webapp when the type underlying a MediaStream changes. *ACTION*: Rename the 'type' attribute to 'kind', but instead of the only values being "video"/"audio", include more information such as codec type. Rename onTypeChanged to onKindChanged, and create a new track every time a change occurs. 2d) The MediaStream object in particular is one that is likely to be shared between multiple working groups, and many people expressed concern over the possibility of incompatibility between these APIs. Ian suggests that we all work with the WhatWG on the specification :) *ACTION*: Find a way to co-ordinate between all interested parties to come up with a single MediaStream definition. The implementation phase for each working group is likely to tie them all together anyway (Firefox, for example, will have only one MediaStream object; which is hopefully a union of all the interfaces defined in each WG). Robert agrees that it would be bad if the APIs are inconsistent, and he is already working on a MediaStream implementation for the Audio WG. 2e) The original proposal suggested that DTMF be represented as an independent MediaStreamTrack. Ian pointed out that the WhatWG has existing specifications for VideoTrack, AudioTrack and TextTrack objects and that we should inter-operate. Niklas noted that SIP has a variety of ways of doing DTMF, and that we should stick to only one. *ACTION*: Make MediaStreamTrack interoperable with the corresponding track specs from the WHATWG, but add a new DTMFTrack type (subclassed from MediaStreamTrack) that would represent the DTMF signal. 3. PeerConnection 3a) Cullen asked for clarification on how the WhatWG model could work without an explicit connect/listen method, and Ralph responded by a description of JS's run-to-completion model. Ian recommends that we let the PeerConnection constructor decide if it is listening or connecting and that we rely on the JS event loop to queue callbacks that will be invoked after the script ends. Cullen points out that it is also inefficient if the client adds multiple streams, since if the connect is implicit it is possible that not all the streams parameters were negotiated in the initial exchange. *ACTION*: We debated this at Mozilla for a fair bit, and there is no consensus yet. Brendan acknowledged that the run-to-completion model is not going anywhere, but if we're concerned about compatibility with existing web APIs, pointed out that XHR has an explicit open() method too. Given this, I would highly recommend exposing an explicit connect() method. Ian, do you have specific examples for why this approach is more error-prone? 3b) The original proposal contained two versions of the PeerConnection API, one in which it represented a single 1:1 connection; and another in which we added a new PeerListener object that would serve as a factory to generate new PeerConnection objects as necessary. Ian expressed interest in support the 1:many broadcast use-case but noted that ICE does not have supported mode to do it. Cullen confirmed that ICE does not allow a multicast between several parties, but individual PeerConnections are formed by large groups all the time. *ACTION*: We want to support the 1:many use-case, and even though the webapp could technically do it on its own by creating multiple PeerConnection objects; we propose providing an abstraction over it (such as the PeerConnection factory) that makes this easy. It has been established that ICE inherently cannot support more than 2 endpoints, but this is not necessary to enable multi-user media streams. 3d) Matthew gave us an overview of what it meant to model PeerConnections after either flows or sessions. Notably, he points out that there is value in reusing session setup, thus, opening a new session should have the same API as that of opening a flow that reuses an existing session. *ACTION*: We have not yet described in detail what actually happens when a PeerConnection object is constructed and what happens when connect() is called [if we decide to adopt that API], but when we do, Mathew's suggestions are certainly useful. Does all of this get specified as part of the W3C-WG or the IETF? That's all I have for now. Fire away :) Best Regards, -Anant
Received on Tuesday, 19 July 2011 21:35:35 UTC