Do we need capabilities?

(Starting as a separate thread, to document objections to getCapabilities)

On Jan 19, 2012, at 8:04 AM, Dan Burnett wrote:
> Why do we need a Capabilities API?
> 1. Web developers want to be able to grey out/hide capabilities that are not available, such as a webcam or microphone.
> 2. Web developers may not know which configurations are allowed or feasible.  For example, for a video device it may be possible to specify high quality or low bandwidth, but not both.  It is quite likely that an app author who wants both would rather specify a preference for one than trust the hint system to make the right selection between the two.

While I fully support both of these requirements, I think that the capabilities API as currently specified doesn't do much to guarantee that the app will get what it wants, and as a side effect also leaks too much information about the user's computer. I think we should strongly consider a 'hints' based approach to see if we can fulfill app developer requirements and only discard it after we deem it unworkable.

1. I think we can all agree that exposing capabilities without user consent of any form is not what we really want. If the current getCapabilities() is able to be invoked by any web page without any indication to the user, it is a massive privacy invasion. Ad services will then be able to add more bits of reliable information in order to personally identify visitors (they already know too much!).

2. Assuming we modify getCapabilities() to be a "trusted" call, i.e. requires user consent before returning, I do not think we will be able to satisfy application requirements. The primary reason for this is that the getCapabilities() call is done separately from getUserMedia(), and in the interim there might be changes to user hardware. Thus the application is not guaranteed to get what it wanted from the list of capabilities, anyway. This implies that UI affordances in applications *cannot* be made solely on the basis of a response from getCapabilities(), the application risks providing functionality that may not be present at the time the user actually initiates action.

3. Revealing information about a user's devices such as webcams (and their supported resolutions) and microphones provides a few bits of information to the website (which it may forward, knowingly or unknowingly, to ad networks). A cursory check at https://panopticlick.eff.org/ reveals that my browser currently exposes ~20 bits of identifying information via other mechanisms (user agent string, plugin list, screen size, system fonts) -- and this is without all the cookie tracking that adds significantly more. You need ~33 bits of information to uniquely identify any given user. Even adding a few bits moves us closer to that number, and thus we must be cautious.

4. The primary reason that a 'hints' based approach does not reveal as many bits as a capabilities based approach is that the result from getUserMedia() given a static set of hints is not guaranteed to be the same. It is temporal, and thus more unreliable that getCapabilities() -- which will always return the same value for a given hardware configuration. In a large number of cases, it is possible that getUserMedia will always return the same kind of stream, but I don't believe that justifies the need for getCapabilities().

5. I'd like to re-iterate that we are designing a WebAPI for use by web pages everywhere, and that common web pages have significantly lower privileges than native applications like Skype, both technically as well as in a user's mind (visiting a web page has much lower friction than launching an app). The web community (Mozilla & Google in particular) are parallelly exploring the notion of "installed webapps" that have elevated privileges, because of the implied user trust that comes with "installing" something. Such a capability API would be a perfect fit in such a scenario. I'd love to discuss this option as well, I'm working closely with the Apps team at Mozilla. Even though it may be outside the scope of this working group -- something to think about.

In light of this, I think we should take another look at the hints proposal, both the one proposed by Tim (in context of addStream: http://lists.w3.org/Archives/Public/public-webrtc/2011Oct/0004.html) as well as the one I proposed on public-media-capture: http://lists.w3.org/Archives/Public/public-media-capture/2012Jan/0014.html, along with the options proposed by Cullen today: http://lists.w3.org/Archives/Public/public-webrtc/2012Jan/0047.html

I think we can tweak those proposals in order to achieve what we want. Even though a webapp may not be able to *pre-emptively* modify UI based on device capabilities, it will be able to handle errors gracefully and provide a good user experience in the cases it does not get what it wants.

One such strawman API to achieve this (very rough):

getUserMedia({
  audio: true,
  video: true,
  hints: [{
    video: {
      purpose: whiteboard,
      resolution: high
    },
    audio: {
      type: music,
      bandwidth: broadband-high,
      handsfree: false
    }
  }]
});

The basic idea is to allow the application to specify an ordered array of hint objects, sorted by preference. The UA will try its best to satisfy the request, in order, and the application will be able to introspect the MediaStream that is returned to see exactly what it got (and can direct the user appropriately). Alternatively, we can make the call fail if none of the provided hints were able to be fulfilled.

I'd love to hear concrete use-cases for when this approach will generally not work, and we can work out the details if there is consensus!

Thanks,
-Anant

Received on Tuesday, 24 January 2012 03:05:50 UTC