- From: Jan-Ivar Bruaroey <jib@mozilla.com>
- Date: Wed, 30 Oct 2019 19:13:15 -0400
- To: Pete Snyder <psnyder@brave.com>
- Cc: youenn fablet <yfablet@apple.com>, Mike O'Neill <michael.oneill@baycloud.com>, Youenn Fablet <youenn@apple.com>, public-privacy <public-privacy@w3.org>, public-webrtc@w3.org
- Message-ID: <2306c582-99da-74ea-d711-3a97eb86ffb4@mozilla.com>
On 10/26/19 7:46 PM, Pete Snyder wrote: >> On 10/24/19 7:02 PM, Pete Snyder wrote: >>> Thank you for the summary. Could you also describe the status of discussions about revealing the local network environment? At TPAC it seemed like there was unanimous support for removing the “VPN” response, and near (though not complete) support or removing the end point all together. The later is the PING suggested approach. >> This matches my understanding, as well as the WebRTC TPAC summary in https://lists.w3.org/Archives/Public/public-webrtc/2019Oct/0046.html: >> >> "Broad interest in removing or gating networkType. Remove “vpn” value." > Terrific. Was just checking whether that had changed since TPAC. Is there an issue I can subscribe to, to keep up on the progress. Yes, https://github.com/w3c/webrtc-stats/issues/374 >>>> There is consensus to double key device IDs if other storage like IDB are double keyed. >>>> There is interest within the WG in having browsers to support full double key storage. >>>> There is little interest within the WG in mandating browsers to double key device IDs if other storage like IDB is not double keyed. >>>> This is in terms of implementation. >>>> >>>> In terms of specification, my personal opinion is that it is fine to describe where browsers want to go, not where browsers are right now. >>>> I am thus fine mandating double keying once we have proper support in the HTML spec for getting the storage partition key. >>>> The HTML spec bits are being worked on. >>> The WG has described in several places how the spec can protect users of browsers with double key’ed storage. Thats not the point of disagreement / uncertainty though. My questions are all “how will the spec protect people using browsers w/o double key’ed storage” (i.e. the majority of web users)? Unless the spec is not going to be implemented until everyone double keys storage (clearly not the case) then the spec needs to protect these users too. >> >> Protect them from what exactly? This was challenged earlier in the thread, but looks unanswered. From https://lists.w3.org/Archives/Public/public-webrtc/2019Oct/0030.html: > The harm being mitigated against is people using long lived device IDs for cookie resync / identification, on platforms that: > > 1) currently don’t double key storage Glad we agree there's no deviceId issue on browsers with partitioned storage and that that's "not the point of disagreement". I take this to mean your proposal of simpler int-counter ids exists solely to mitigate correlation on systems that don't yet partition storage. > 2) have a variety of storage / privacy preserving mitigations in place that affect storage values, and designed to prevent the need > for a single, global, clear storage event (e.g. limiting lifetime of JS cookies, per cookie) This seems unrealistic as a correlation mitigation to me, because of how simply it's defeated: store unique info A across all storage classes, then on every revisit retrieve A successfully from *any* storage class and re-save A across all storage classes. It would also seem to have data integrity issues. Firefox is doing early experiments with partitioning of all third-party storage and communication APIs (cookies, cache, IDB, localStorage, DOM cache, localStorage storage events, broadcast channel, etc etc). We're still not sure what will be web compatible, but we hope to partition as much storage in the same bin as we can, otherwise it seems trivially defeated. I don't speak for Safari obviously, but I thought I heard they're similarly hoping to partition most third-party storage, if they haven't already (at least both localStorage and IDB)? This is the only thing that seems effective and realistic to me to mitigate correlation across top-level sites. deviceId seems like a red herring here. > More broadly, but distinctly this is an effort to limit the privacy risk in all cases, not just the margins, to reduce technical privacy debt. Saying “there is no attack that _requires_ this functionality for reidentification” (my understanding of the pushback so far) is not the same thing as saying the functionality is not privacy harmful (which adding unnecessary globally unique identifiers into the platform plainly is). If the remaining effort is to "reduce technical privacy debt" then we seem to agree there's no empirical privacy harm. Local storage is "privacy harmful". deviceIds exist solely to support local storage of them. Ideally we'd rotate any id that's not stored. Sadly, that's hard to implement, but remains my lifespan shorthand. If there's a way to phrase that in the spec, I'm all for it. If we want to tie it to a specific storage class even, I'm open to that too. Since there's already plans to partition storage, AND any effort short of that is trivially defeated, I see zero net privacy benefit from further effort (of if you will: zero net harm that warrants mitigation). > I’ve made both of these points several times now, both at TPAC and on these threads. I am happy to link back to them if its helpful. Thanks, but I was rather hoping you'd engage directly with the email comment I quoted from your co-PING-member which contradicts your position. There seems to be disagreement within the PING working group here. I don't know who speaks for the PING working group as a whole. >>>> There is no consensus there, partly due to the fact that details are fuzzy (what means recycling in particular), also since double keying solves the same issue. >>> 1. The device ids -> ints suggestion is a way to protect users of browsers that don’t double key >>> 2. I’m happy to explain any ambiguity, but in short, if I plug in the same web cam twice, it gets the same int handle twice. If a new device is added, it gets the next int. These mapping of device ids -> ints are stored per domain. Again, happy to explain further. >> >> I spoke to this earlier as well, but didn't get a response. From https://lists.w3.org/Archives/Public/public-webrtc/2019Oct/0030.html: >> >> >> "Nothing would break semantically, but I think Youenn is right any >> counter becomes an id over time as devices are added/removed. > Sorry but I do not understand the point about counters becoming IDs, in a world where most machines will be attached to a relatively small number of devices total. So if the counters are always less than 10 or even 100 (under the idea that most devices will never touch more than 10 or 100 or media devices), I don’t follow the claim on how this would become a tracking vector. At the very least it would move deviceId based tracking from being a common case vulnerability to an extremely-uncommon one. > >> Plus, if >> the "per-origin" part is poorly implemented, the counter might correlate >> across origins (users change tabs often, devices rarely). Users with >> many devices would stand out. That sounds worse than what we have (in >> the spec) now." > > > 1. Users with many devices may stand out, but thats surely way better than every user being in an anonymity set of 1 because of globally unique device IDs being reused across 3p contexts. > > 2. If this is the meat of the concern, device ids can be sampled from [0, 63 or 127]. The id then doesn’t reveal number of devices, and doesn’t have enough info to be meaningfully identifying. > > The privacy harm here is that device ids are globally unique, when they only need to be unique on a per-browser-per-domain. > Int counters is one option, but basically anything would be better than the present proposal. > > Is there _any_ user benefit to having global device ids (I have not heard one, I’ve only heard explanations of why it isn’t as bad as I think it is)? Can we all agree that per-user-ids add privacy risk, even if we disagree on the magnitude of that risk (since it might be mirrored by other risks)? I’m just sincerely confused that there is so much defense of the present option… I think your int counters (if implemented carefully) would qualify as an implementation of the current spec https://w3c.github.io/mediacapture-main/getusermedia.html#def-mediadeviceinfo-deviceId and have much lower entropy than most implementations today, which from a purity point seems nice if we're going to put something in a browser. I think the disagreement (even within PING) is over whether there is >0 net privacy gain today of mandating further effort when users are trivially correlated by localStorage without partitioning. >>>> The spec does not mandate to expose all devices after giving access to a single device. >>>> The spec does not mandate to expose only the exposed devices. >>>> My understanding is that the current state there is ok with the WG and that changing implementations is difficult in terms of web compatibility. >>> I completely understand that the spec is ambiguous here, regarding privacy protections. Thats the point of this, to make the spec strong and unambiguous re privacy :) >>> >>> Re webcompat: if its difficult to change now, its only going to get more difficult going forward. But there is no way we can lock in privacy harming behavior because its specified in a _draft_. Thats incompatible with the standards process, and harmful to web privacy (as the WG noted during TPAC, most accesses to this information appears to be for fingerprinting purposes). What is the possible user-serving benefit for “I give a site access to device A, so the browser now also tells the site information about device B”? >> The user-serving benefit is in-content selection of which camera and/or microphone to use. > Say I want to google-hangouts with someone. I have boring video camera installed (maybe built into my laptop), and some other device “super secret or embarrassing X media device” (secret company prototype, sexual-related thing, etc. etc. etc.). The browser pops up the dialog saying “which media devices can the site access,” I very carefully think “I don’t want anyone to know about the embarrassing one, I will carefully choose the boring video camera”. But then… the page also learns about the embarrassing device. Note all browsers (except Firefox desktop) prompt users without saying which camera or microphone they're being asked to share. Best to unplug it first. > The group doesn’t view the above scenario as a problem? I don't know what the group thinks about this. Embarrassing labels were not discussed at TPAC, nor has an issue been raised AFAIK. I was merely attempting to your question. ;) > It seems plain that this is a privacy violation, and that the spec saying “sites can only learn about devices the user has granted access too” is a solution (maybe one among many, but at least one). Note that this would not move the ball much on Chrome, since Chrome grants all devices of a kind at once (if you're looking for low-hanging privacy fruit...) > If this is a use case you think users will want: “give access to device A, but inform site of B,C,D” then that seems like a great solution for an opt-in “check here to tell the site about other devices” option. But telling the site about unrelated hardware, _after the user just explicitly chose some different hardware_ is both an explicit privacy violation, and just seems generally rude. I don't see the spec forbidding user agents from exercising discretion in labels, or implementing what you propose. However, if you want to mandate a UX flow where user-selection comes before permission—and I don't see how users can have an expectation here otherwise—then all browsers (except Firefox desktop) will have a lot of work to do. .: Jan-Ivar :. >> What "different answers" do you mean? I watched the recording before writing the WeBRTC TPAC summary. If you find a discrepancy, please let me know! >> >> AFAIK "3) how device labels interact with permissions" or "why giving permission to a single device gives access to all device labels / ids)." were not discussed at TPAC. > I _have not_ gone back to the video, and was not at my best that morning, so could easily be remembering wrong. But my memory of leaving that at the meeting is we’d agreed to double keying device ids (even if some folks in the meeting were not “happy with the outcome”). > > All that being said, me complaining about people changing their minds (assuming im even remembering correctly) is not useful or helpful. I apologize for bringing it up and won’t again :) > >>> How can we help get privacy protections in the spec? >> If you could follow up and answer some of the stalled challenges raised in this thread that would be great thanks! I feel the more specific we can be here, the more productive it'll be. > Hopefully the above helps. If I’m still not being sufficiently specific, please let me know; its not for lack of trying :) > > Pete > >
Received on Wednesday, 30 October 2019 23:13:21 UTC