Re: Follow up issues from WebRTC / PING TPAC meeting from Pete Snyder on 2019-10-26 (public-privacy@w3.org from October to December 2019)

From: Pete Snyder <psnyder@brave.com>
Date: Sat, 26 Oct 2019 16:46:45 -0700
To: Jan-Ivar Bruaroey <jib@mozilla.com>
Cc: youenn fablet <yfablet@apple.com>, Mike O'Neill <michael.oneill@baycloud.com>, Youenn Fablet <youenn@apple.com>, public-privacy <public-privacy@w3.org>, public-webrtc@w3.org
Message-Id: <56999C18-EC42-4882-8578-21A70987EE08@brave.com>
> On 10/24/19 7:02 PM, Pete Snyder wrote:
>> Thank you for the summary.  Could you also describe the status of discussions about revealing the local network environment?  At TPAC it seemed like there was unanimous support for removing the “VPN” response, and near (though not complete) support or removing the end point all together.  The later is the PING suggested approach.
> 
> This matches my understanding, as well as the WebRTC TPAC summary in https://lists.w3.org/Archives/Public/public-webrtc/2019Oct/0046.html:
> 
> "Broad interest in removing or gating networkType. Remove “vpn” value."

Terrific.  Was just checking whether that had changed since TPAC.  Is there an issue I can subscribe to, to keep up on the progress.

>>> There is consensus to double key device IDs if other storage like IDB are double keyed.
>>> There is interest within the WG in having browsers to support full double key storage.
>>> There is little interest within the WG in mandating browsers to double key device IDs if other storage like IDB is not double keyed.
>>> This is in terms of implementation.
>>> 
>>> In terms of specification, my personal opinion is that it is fine to describe where browsers want to go, not where browsers are right now.
>>> I am thus fine mandating double keying once we have proper support in the HTML spec for getting the storage partition key.
>>> The HTML spec bits are being worked on.
>> The WG has described in several places how the spec can protect users of browsers with double key’ed storage.  Thats not the point of disagreement / uncertainty though.  My questions are all “how will the spec protect people using browsers w/o double key’ed storage” (i.e. the majority of web users)?  Unless the spec is not going to be implemented until everyone double keys storage (clearly not the case) then the spec needs to protect these users too.
> 
> 
> Protect them from what exactly? This was challenged earlier in the thread, but looks unanswered. From https://lists.w3.org/Archives/Public/public-webrtc/2019Oct/0030.html:

The harm being mitigated against is people using long lived device IDs for cookie resync / identification, on platforms that:

1) currently don’t double key storage
2) have a variety of storage / privacy preserving mitigations in place that affect storage values, and designed to prevent the need 
for a single, global, clear storage event (e.g. limiting lifetime of JS cookies, per cookie)

More broadly, but distinctly this is an effort to limit the privacy risk in all cases, not just the margins, to reduce technical privacy debt.  Saying “there is no attack that _requires_ this functionality for reidentification” (my understanding of the pushback so far) is not the same thing as saying the functionality is not privacy harmful (which adding unnecessary globally unique identifiers into the platform plainly is).

I’ve made both of these points several times now, both at TPAC and on these threads.  I am happy to link back to them if its helpful.

>>> There is no consensus there, partly due to the fact that details are fuzzy (what means recycling in particular), also since double keying solves the same issue.
>> 1. The device ids -> ints suggestion is a way to protect users of browsers that don’t double key
>> 2. I’m happy to explain any ambiguity, but in short, if I plug in the same web cam twice, it gets the same int handle twice.  If a new device is added, it gets the next int.  These mapping of device ids -> ints are stored per domain.  Again, happy to explain further.
> 
> 
> I spoke to this earlier as well, but didn't get a response. From https://lists.w3.org/Archives/Public/public-webrtc/2019Oct/0030.html:
> 
> 
> "Nothing would break semantically, but I think Youenn is right any
> counter becomes an id over time as devices are added/removed.

Sorry but I do not understand the point about counters becoming IDs, in a world where most machines will be attached to a relatively small number of devices total.  So if the counters are always less than 10 or even 100 (under the idea that most devices will never touch more than 10 or 100 or media devices), I don’t follow the claim on how this would become a tracking vector.  At the very least it would move deviceId based tracking from being a common case vulnerability to an extremely-uncommon one.

> Plus, if
> the "per-origin" part is poorly implemented, the counter might correlate
> across origins (users change tabs often, devices rarely). Users with
> many devices would stand out. That sounds worse than what we have (in
> the spec) now."



1. Users with many devices may stand out, but thats surely way better than every user being in an anonymity set of 1 because of globally unique device IDs being reused across 3p contexts.

2. If this is the meat of the concern, device ids can be sampled from [0, 63 or 127].  The id then doesn’t reveal number of devices, and doesn’t have enough info to be meaningfully identifying.

The privacy harm here is that device ids are globally unique, when they only need to be unique on a per-browser-per-domain.  Int counters is one option, but basically anything would be better than the present proposal.

Is there _any_ user benefit to having global device ids (I have not heard one, I’ve only heard explanations of why it isn’t as bad as I think it is)?  Can we all agree that per-user-ids add privacy risk, even if we disagree on the magnitude of that risk (since it might be mirrored by other risks)?  I’m just sincerely confused that there is so much defense of the present option…
> 
>>> The spec does not mandate to expose all devices after giving access to a single device.
>>> The spec does not mandate to expose only the exposed devices.
>>> My understanding is that the current state there is ok with the WG and that changing implementations is difficult in terms of web compatibility.
>> I completely understand that the spec is ambiguous here, regarding privacy protections.  Thats the point of this, to make the spec strong and unambiguous re privacy :)
>> 
>> Re webcompat: if its difficult to change now, its only going to get more difficult going forward.  But there is no way we can lock in privacy harming behavior because its specified in a _draft_.  Thats incompatible with the standards process, and harmful to web privacy (as the WG noted during TPAC, most accesses to this information appears to be for fingerprinting purposes).  What is the possible user-serving benefit for “I give a site access to device A, so the browser now also tells the site information about device B”?
> 
> The user-serving benefit is in-content selection of which camera and/or microphone to use.

Say I want to google-hangouts with someone.  I have boring video camera installed (maybe built into my laptop), and some other device “super secret or embarrassing X media device” (secret company prototype, sexual-related thing, etc. etc. etc.).  The browser pops up the dialog saying “which media devices can the site access,” I very carefully think “I don’t want anyone to know about the embarrassing one, I will carefully choose the boring video camera”.  But then… the page also learns about the embarrassing device.

The group doesn’t view the above scenario as a problem?  It seems plain that this is a privacy violation, and that the spec saying “sites can only learn about devices the user has granted access too” is a solution (maybe one among many, but at least one). 

If this is a use case you think users will want: “give access to device A, but inform site of B,C,D” then that seems like a great solution for an opt-in “check here to tell the site about other devices” option.  But telling the site about unrelated hardware, _after the user just explicitly chose some different hardware_ is both an explicit privacy violation, and just seems generally rude.

> What "different answers" do you mean? I watched the recording before writing the WeBRTC TPAC summary. If you find a discrepancy, please let me know!
> 
> AFAIK "3) how device labels interact with permissions" or "why giving permission to a single device gives access to all device labels / ids)." were not discussed at TPAC.

I _have not_ gone back to the video, and was not at my best that morning, so could easily be remembering wrong.  But my memory of leaving that at the meeting is we’d agreed to double keying device ids (even if some folks in the meeting were not “happy with the outcome”).

All that being said, me complaining about people changing their minds (assuming im even remembering correctly) is not useful or helpful.  I apologize for bringing it up and won’t again :)

>> How can we help get privacy protections in the spec?
> 
> If you could follow up and answer some of the stalled challenges raised in this thread that would be great thanks! I feel the more specific we can be here, the more productive it'll be.

Hopefully the above helps.  If I’m still not being sufficiently specific, please let me know; its not for lack of trying :)

Pete
Received on Saturday, 26 October 2019 23:46:51 UTC