- From: Stefan Hakansson LK <stefan.lk.hakansson@ericsson.com>
- Date: Thu, 19 Jan 2012 15:17:32 +0100
- To: public-media-capture@w3.org
On 01/19/2012 02:53 PM, Robin Berjon wrote: > On Jan 19, 2012, at 14:11 , Harald Alvestrand wrote: >> On 01/19/2012 11:12 AM, Robin Berjon wrote: >>> On Jan 19, 2012, at 08:06 , Anant Narayanan wrote: >>>> However, exposing fine grained control over media hardware to >>>> web applications has serious security and privacy implications. >>>> Enumeration of available devices, for example, will provide >>>> several bits of data that will allow third parties to more >>>> easily fingerprint users. >>> Yes, we should definitely not support device characteristics >>> enumeration. >> I'll just raise a dissenting voice .... I think the concern here >> (and the cost of addressing that concern) is out of proportion with >> the threat; the mechanisms available for distinguishing users are >> so legion that the additional bits available from knowing that >> there are two cameras on the device, one of which supports HD >> quality video, is microscopic in comparision. > > I think that the mistake you're making here is that you're > considering these things in isolation. One bit here and two bits > there aren't much on their own, but bits are nasty little critters > that add up rather quickly. Listing the number of video inputs, the > number of audio inputs, alongside their capabilities is a fair bit of > bits, too. > > Fingerprinting isn't a boogeyman, it's a genuine issue. That being > said, it always needs to be balanced against the value of the use > case. As I explain below, I don't think that the enumeration use case > adds that much value. > >>>> However, some applications will need a minimum set of >>>> requirements in order to be able to function. I propose that we >>>> leave it up to the application to detect if the resulting >>>> stream has the characteristics it wants, and provide the user >>>> with an appropriate message (and perhaps retry with another >>>> call to getUserMedia()) if it does not. >>> Just to be clear, I presume you mean intrinsic properties of the >>> produced stream that would apply if the stream came from >>> somewhere else. In other words, an application can complain that >>> it's not getting stereo sound because it needs it, but it >>> shouldn't be able to complain that it was given the back camera >>> when it wanted the front, right? >> I think this example is limiting enough to be misleading. If we >> break the bond between "camera" and "physically part of the device >> on which the user is performing the UI interaction", categories >> like "front" or "back" become fully misleading. >> >> For instance, conferencing units may have a room camera, a document >> camera and a lectern camera, neither of which is attached to the >> physical box; which one of these is "front"? (this illustrates the >> need for SOMEONE - either browser chrome or JS API - to enumerate >> cameras, btw) > > There are several ways of addressing this. > > Option A. Allow enumeration of all the room's cameras. If this is to > be made useful for users, you need to provide human-readable names of > some kind so that the application can offer the user a choice. We're > not talking about a few bits of fingerprinting here — if I'm in an > office space (as I am now) with access not just to my devices' > cameras but also others, and they all have human-readable names, > that's a lot of information. If, as one can almost surely expect, the > names for these video devices include the brand and make, I'm getting > close to being uniquely identifiable with that information alone. > > Option B. Having "front" and "back" is a sufficient ontology for the > 80% case. Given that users can *always* choose which device they want > to use (i.e. if the request hint is for "front" because that makes > sense to a conferencing scenario, I can still pick "back" because I'm > setting up conferencing for someone else, or I can still pick "room" > if I want to — the app doesn't need to know) it doesn't break the > remaining 20%, or in fact make them substantially less usable. > > Option C. We stick to the hint story but people feel we need a more > detailed ontology to convey a greater set of input situations. So we > create a more detailed ontology including things like "room", > "lectern", "spyplane", etc. Using this, option A can be modified so > that instead of full names only names pertaining to the ontology are > returned, which is less privacy-invasive. > > I believe that option A is unacceptable from a privacy standpoint. As > for option C, it's a mesh of ratholes so deep you could probably go > Balrog-hunting in it. How precise does the ontology need to be? How > far out to niche use cases do we go? How is it evolved over time and > how do I know which version my user agent understands? What happens > if there are naming conflicts? Who gets to configure devices set up > in a room so as to conform to the ontology? Will my French IT staff > think that "lectern" is the document camera since that's for reading > ("lecture")? Is it a lectern camera if it can follow me off-stage as > I crowd-surf a roomful geeks driven to wild ecstasy by my > presentation on the value of hardware ontologies in API design? > > I like option B best :) It works for the more common cases, and it > doesn't break the less common ones. I have basically the same view. But what is needed in the less common cases is that the browser can show a pre-view to the user of all cameras in some kind of selector so that the user can pick the right view (camera). > >>> This is borderline bikeshedding so I won't insist, but it seems >>> to me that if you pick the right names you can avoid those two >>> levels of nesting: >>> >>> { "audio": false , "video": false , "channels": "2" , >>> "quality": "voip" , "camera": "back" } >> Minor return bikeshed: you have now assumed that there is a single >> namespace for both audio hints and video hints. I can easily >> imagine "quality" being relevant for both audio and video, but with >> incompatible semantics. > > That's why I said "pick the right names". Yes indeed if we don't > there would be clashes. > >> I'd prefer the two-dictionary solution. But like Robin, not >> strongly. > > As Dom said we can have it so that audio and video both accept "true" > as well as a set of hints, which works fine for me. Note that this > would allow { audio: {}, video: {} } instead of "true" (which is fine > IMHO). Having provided options on the colour of this bikeshed, I > suggest we let Anant pick which one he favours :) >
Received on Thursday, 19 January 2012 14:18:11 UTC