Re: Hints argument & privacy concerns from Stefan Hakansson LK on 2012-01-19 (public-media-capture@w3.org from January 2012)

From: Stefan Hakansson LK <stefan.lk.hakansson@ericsson.com>
Date: Thu, 19 Jan 2012 15:17:32 +0100
To: public-media-capture@w3.org
Message-ID: <4F1825FC.5050107@ericsson.com>
On 01/19/2012 02:53 PM, Robin Berjon wrote:
> On Jan 19, 2012, at 14:11 , Harald Alvestrand wrote:
>> On 01/19/2012 11:12 AM, Robin Berjon wrote:
>>> On Jan 19, 2012, at 08:06 , Anant Narayanan wrote:
>>>> However, exposing fine grained control over media hardware to
>>>> web applications has serious security and privacy implications.
>>>> Enumeration of available devices, for example, will provide
>>>> several bits of data that will allow third parties to more
>>>> easily fingerprint users.
>>> Yes, we should definitely not support device characteristics
>>> enumeration.
>> I'll just raise a dissenting voice .... I think the concern here
>> (and the cost of addressing that concern) is out of proportion with
>> the threat; the mechanisms available for distinguishing users are
>> so legion that the additional bits available from knowing that
>> there are two cameras on the device, one of which supports HD
>> quality video, is microscopic in comparision.
>
> I think that the mistake you're making here is that you're
> considering these things in isolation. One bit here and two bits
> there aren't much on their own, but bits are nasty little critters
> that add up rather quickly. Listing the number of video inputs, the
> number of audio inputs, alongside their capabilities is a fair bit of
> bits, too.
>
> Fingerprinting isn't a boogeyman, it's a genuine issue. That being
> said, it always needs to be balanced against the value of the use
> case. As I explain below, I don't think that the enumeration use case
> adds that much value.
>
>>>> However, some applications will need a minimum set of
>>>> requirements in order to be able to function. I propose that we
>>>> leave it up to the application to detect if the resulting
>>>> stream has the characteristics it wants, and provide the user
>>>> with an appropriate message (and perhaps retry with another
>>>> call to getUserMedia()) if it does not.
>>> Just to be clear, I presume you mean intrinsic properties of the
>>> produced stream that would apply if the stream came from
>>> somewhere else. In other words, an application can complain that
>>> it's not getting stereo sound because it needs it, but it
>>> shouldn't be able to complain that it was given the back camera
>>> when it wanted the front, right?
>> I think this example is limiting enough to be misleading. If we
>> break the bond between "camera" and "physically part of the device
>> on which the user is performing the UI interaction", categories
>> like "front" or "back" become fully misleading.
>>
>> For instance, conferencing units may have a room camera, a document
>> camera and a lectern camera, neither of which is attached to the
>> physical box; which one of these is "front"? (this illustrates the
>> need for SOMEONE - either browser chrome or JS API - to enumerate
>> cameras, btw)
>
> There are several ways of addressing this.
>
> Option A. Allow enumeration of all the room's cameras. If this is to
> be made useful for users, you need to provide human-readable names of
> some kind so that the application can offer the user a choice. We're
> not talking about a few bits of fingerprinting here — if I'm in an
> office space (as I am now) with access not just to my devices'
> cameras but also others, and they all have human-readable names,
> that's a lot of information. If, as one can almost surely expect, the
> names for these video devices include the brand and make, I'm getting
> close to being uniquely identifiable with that information alone.
>
> Option B. Having "front" and "back" is a sufficient ontology for the
> 80% case. Given that users can *always* choose which device they want
> to use (i.e. if the request hint is for "front" because that makes
> sense to a conferencing scenario, I can still pick "back" because I'm
> setting up conferencing for someone else, or I can still pick "room"
> if I want to — the app doesn't need to know) it doesn't break the
> remaining 20%, or in fact make them substantially less usable.
>
> Option C. We stick to the hint story but people feel we need a more
> detailed ontology to convey a greater set of input situations. So we
> create a more detailed ontology including things like "room",
> "lectern", "spyplane", etc. Using this, option A can be modified so
> that instead of full names only names pertaining to the ontology are
> returned, which is less privacy-invasive.
>
> I believe that option A is unacceptable from a privacy standpoint. As
> for option C, it's a mesh of ratholes so deep you could probably go
> Balrog-hunting in it. How precise does the ontology need to be? How
> far out to niche use cases do we go? How is it evolved over time and
> how do I know which version my user agent understands? What happens
> if there are naming conflicts? Who gets to configure devices set up
> in a room so as to conform to the ontology? Will my French IT staff
> think that "lectern" is the document camera since that's for reading
> ("lecture")? Is it a lectern camera if it can follow me off-stage as
> I crowd-surf a roomful geeks driven to wild ecstasy by my
> presentation on the value of hardware ontologies in API design?
>
> I like option B best :) It works for the more common cases, and it
> doesn't break the less common ones.

I have basically the same view. But what is needed in the less common 
cases is that the browser can show a pre-view to the user of all cameras 
in some kind of selector so that the user can pick the right view (camera).

>
>>> This is borderline bikeshedding so I won't insist, but it seems
>>> to me that if you pick the right names you can avoid those two
>>> levels of nesting:
>>>
>>> { "audio": false ,   "video": false ,   "channels": "2" ,
>>> "quality": "voip" ,   "camera": "back" }
>> Minor return bikeshed: you have now assumed that there is a single
>> namespace for both audio hints and video hints. I can easily
>> imagine "quality" being relevant for both audio and video, but with
>> incompatible semantics.
>
> That's why I said "pick the right names". Yes indeed if we don't
> there would be clashes.
>
>> I'd prefer the two-dictionary solution. But like Robin, not
>> strongly.
>
> As Dom said we can have it so that audio and video both accept "true"
> as well as a set of hints, which works fine for me. Note that this
> would allow { audio: {}, video: {} } instead of "true" (which is fine
> IMHO). Having provided options on the colour of this bikeshed, I
> suggest we let Anant pick which one he favours :)
>
Received on Thursday, 19 January 2012 14:18:11 UTC