Re: Do we need capabilities? from Adam Bergkvist on 2012-01-24 (public-webrtc@w3.org from January 2012)

From: Adam Bergkvist <adam.bergkvist@ericsson.com>
Date: Tue, 24 Jan 2012 14:29:33 +0100
To: Neil Stratford <nstratford@voxeo.com>
CC: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <4F1EB23D.9020707@ericsson.com>
On 01/24/2012 10:18 AM, Neil Stratford wrote:
> On 24/01/2012 03:04, Anant Narayanan wrote:
>> (Starting as a separate thread, to document objections to
>> getCapabilities)
>
>> 1. I think we can all agree that exposing capabilities without user
>> consent of any form is not what we really want. If the current
>> getCapabilities() is able to be invoked by any web page without any
>> indication to the user, it is a massive privacy invasion. Ad services
>> will then be able to add more bits of reliable information in order
>> to personally identify visitors (they already know too much!).
>
> I agree, getCapabilities() does require user approval, which could also
> be used to pre-approve access for a later getUserMeida() request.
>
>> 2. Assuming we modify getCapabilities() to be a "trusted" call, i.e.
>> requires user consent before returning, I do not think we will be
>> able to satisfy application requirements. The primary reason for this
>> is that the getCapabilities() call is done separately from
>> getUserMedia(), and in the interim there might be changes to user
>> hardware. Thus the application is not guaranteed to get what it
>> wanted from the list of capabilities, anyway. This implies that UI
>> affordances in applications *cannot* be made solely on the basis of a
>> response from getCapabilities(), the application risks providing
>> functionality that may not be present at the time the user actually
>> initiates action.
>
> Changes to hardware can equally happen after a call to getUserMedia()
> but before the session is actually established. A user could connect or
> disconnect a webcam at any point. However, I don't see this as a problem
> - especially if we can provide some kind of callback notification
> mechanism on any such changes that enables the UI to be updated. If I
> plug in a webcam after loading a web page I expect the UI to be updated
> to reflect that I can now make video calls.

I think it's a privacy issue to allow the web app to detect a new device 
(and any information about it) without involving the user. However, 
making new devices available to the browser is preferable. Consider the 
case when a user visits a conferencing web app and brigs up the 
camera/mic selector UI, provided by the browser, and notices that the 
desired camera isn't in the list of available devices. In this case, the 
camera should appear in the selector UI as soon as the user plugs it in 
to make it available for selection. The difference here is that it's the 
browser (which is trusted) that can detect new devices, and not the web app.

>> 4. The primary reason that a 'hints' based approach does not reveal
>> as many bits as a capabilities based approach is that the result from
>> getUserMedia() given a static set of hints is not guaranteed to be
>> the same. It is temporal, and thus more unreliable that
>> getCapabilities() -- which will always return the same value for a
>> given hardware configuration. In a large number of cases, it is
>> possible that getUserMedia will always return the same kind of
>> stream, but I don't believe that justifies the need for
>> getCapabilities().
>
> If I provide a hint of "best possible", won't the resulting SDP payload
> (containing full codec list and associated parameters) leak far more
> information than a list of high-level profiles from getCapabilities()?

The main difference here is that getting a stream involves the user; it 
cannot be done under the hood on any page without the users knowledge. 
Once the user starts to give the web app information as, e.g., email 
address, credit card number or as in this case, a stream, it's a bit of 
a different scenario.

> There are things that users expect that would not be possible without
> capabilities - for example rich presence displaying camera icons next to
> contacts who are available for video calls, or hiding the ability to
> call entirely if no microphone is available. I find it difficult to
> imagine a good UI that has no information about what may or may not be
> possible before an actual call is attempted.
>
> My concern is primarily providing a good end user experience that
> doesn't require hacks like attempting dummy call setups to retrieve
> capability information - which is likely what developers will resort to
> if we don't provide such an API.

I agree that having the UI reflect what the app can do is a tricky 
matter. But knowing the capabilities of the browser is not a guarantee 
for successful communication. The user (on any side) may not give 
permission to use a device, and even when you have a stream, there's a 
risk of not finding a working transport.

/Adam
Received on Tuesday, 24 January 2012 13:34:46 UTC