Re: Hints argument & privacy concerns from Robin Berjon on 2012-01-19 (public-media-capture@w3.org from January 2012)

From: Robin Berjon <robin@berjon.com>
Date: Thu, 19 Jan 2012 14:53:31 +0100
To: Harald Alvestrand <harald@alvestrand.no>
Cc: Anant Narayanan <anant@mozilla.com>, public-media-capture@w3.org
Message-Id: <6DC895FE-A160-4D30-A3B7-2219C68BCAC8@berjon.com>
On Jan 19, 2012, at 14:11 , Harald Alvestrand wrote:
> On 01/19/2012 11:12 AM, Robin Berjon wrote:
>> On Jan 19, 2012, at 08:06 , Anant Narayanan wrote:
>>> However, exposing fine grained control over media hardware to web applications has serious security and privacy implications. Enumeration of available devices, for example, will provide several bits of data that will allow third parties to more easily fingerprint users.
>> Yes, we should definitely not support device characteristics enumeration.
> I'll just raise a dissenting voice .... I think the concern here (and the cost of addressing that concern) is out of proportion with the threat; the mechanisms available for distinguishing users are so legion that the additional bits available from knowing that there are two cameras on the device, one of which supports HD quality video, is microscopic in comparision.

I think that the mistake you're making here is that you're considering these things in isolation. One bit here and two bits there aren't much on their own, but bits are nasty little critters that add up rather quickly. Listing the number of video inputs, the number of audio inputs, alongside their capabilities is a fair bit of bits, too.

Fingerprinting isn't a boogeyman, it's a genuine issue. That being said, it always needs to be balanced against the value of the use case. As I explain below, I don't think that the enumeration use case adds that much value.

>>> However, some applications will need a minimum set of requirements in order to be able to function. I propose that we leave it up to the application to detect if the resulting stream has the characteristics it wants, and provide the user with an appropriate message (and perhaps retry with another call to getUserMedia()) if it does not.
>> Just to be clear, I presume you mean intrinsic properties of the produced stream that would apply if the stream came from somewhere else. In other words, an application can complain that it's not getting stereo sound because it needs it, but it shouldn't be able to complain that it was given the back camera when it wanted the front, right?
> I think this example is limiting enough to be misleading. If we break the bond between "camera" and "physically part of the device on which the user is performing the UI interaction", categories like "front" or "back" become fully misleading.
> 
> For instance, conferencing units may have a room camera, a document camera and a lectern camera, neither of which is attached to the physical box; which one of these is "front"?
> (this illustrates the need for SOMEONE - either browser chrome or JS API - to enumerate cameras, btw)

There are several ways of addressing this.

Option A. Allow enumeration of all the room's cameras. If this is to be made useful for users, you need to provide human-readable names of some kind so that the application can offer the user a choice. We're not talking about a few bits of fingerprinting here — if I'm in an office space (as I am now) with access not just to my devices' cameras but also others, and they all have human-readable names, that's a lot of information. If, as one can almost surely expect, the names for these video devices include the brand and make, I'm getting close to being uniquely identifiable with that information alone.

Option B. Having "front" and "back" is a sufficient ontology for the 80% case. Given that users can *always* choose which device they want to use (i.e. if the request hint is for "front" because that makes sense to a conferencing scenario, I can still pick "back" because I'm setting up conferencing for someone else, or I can still pick "room" if I want to — the app doesn't need to know) it doesn't break the remaining 20%, or in fact make them substantially less usable.

Option C. We stick to the hint story but people feel we need a more detailed ontology to convey a greater set of input situations. So we create a more detailed ontology including things like "room", "lectern", "spyplane", etc. Using this, option A can be modified so that instead of full names only names pertaining to the ontology are returned, which is less privacy-invasive.

I believe that option A is unacceptable from a privacy standpoint. As for option C, it's a mesh of ratholes so deep you could probably go Balrog-hunting in it. How precise does the ontology need to be? How far out to niche use cases do we go? How is it evolved over time and how do I know which version my user agent understands? What happens if there are naming conflicts? Who gets to configure devices set up in a room so as to conform to the ontology? Will my French IT staff think that "lectern" is the document camera since that's for reading ("lecture")? Is it a lectern camera if it can follow me off-stage as I crowd-surf a roomful geeks driven to wild ecstasy by my presentation on the value of hardware ontologies in API design?

I like option B best :) It works for the more common cases, and it doesn't break the less common ones.

>> This is borderline bikeshedding so I won't insist, but it seems to me that if you pick the right names you can avoid those two levels of nesting:
>> 
>> {
>>     "audio": false
>> ,   "video": false
>> ,   "channels": "2"
>> ,   "quality": "voip"
>> ,   "camera": "back"
>> }
> Minor return bikeshed: you have now assumed that there is a single namespace for both audio hints and video hints. I can easily imagine "quality" being relevant for both audio and video, but with incompatible semantics.

That's why I said "pick the right names". Yes indeed if we don't there would be clashes.

> I'd prefer the two-dictionary solution. But like Robin, not strongly.

As Dom said we can have it so that audio and video both accept "true" as well as a set of hints, which works fine for me. Note that this would allow { audio: {}, video: {} } instead of "true" (which is fine IMHO). Having provided options on the colour of this bikeshed, I suggest we let Anant pick which one he favours :)

-- 
Robin Berjon - http://berjon.com/ - @robinberjon
Received on Thursday, 19 January 2012 13:54:06 UTC