Re: Follow up issues from WebRTC / PING TPAC meeting

Pete, please find inline.

I think the idea is for me to write some PRs and have your feedback on these.
This might help scoping the discussion and fine tuning the various changes to the specs we discussed (and agreed I think) at TPAC.
 Y


> On 1 Oct 2019, at 00:19, Pete Snyder <psnyder@brave.com> wrote:
> 
>>> 
>>>> Regarding values returned by enumerateDevices (#612)
>>>> ---
>> 
>> This is true if device-info permission is not granted to the page.
>> If device-info permission is granted (after user granted access to the camera for instance), enumerateDevices will provide the full list of devices, with labels and ids.
> 
> Do I understand right that if I give access to a single device, enumerateDevices will return information about all devices?  This was not my understanding from the TPAC conversation, and more sig, it seems unnecessarily privacy harming.

I do not know any browser implementing this approach right now.
The permission spec states that granting getUserMedia is granting device info permission which grants information on all available devices.
I think the spec allows what you are proposing but does not enforce it.
The idea would be to emulate that devices being used are actually plugged in just before starting to use them.

> Could enumerateDevices just return only the union of
> 1) always return an object that does not include any device ids or labels, one entry describing every possible device type
> 2) information about any device the user has given permission too

Sadly, I think this would break existing websites, see for instance how most webrtc providers implement their device pickers.

> 
>>>> Regarding uniqueness of device IDs (#607)
>>>> --
>>>> For new hardware types (e.g. external speakers): device IDs MUST NOT be globally unique, but opaque-identifying integers (or some other non-identifying, non-unique handle).
>> 
>> I think the idea is that for browsers exposing speakers through enumerateDevices, device IDs will be double keyed.
>> There is discussion to find an alternative to that approach, hopefully I’ll have a full proposal soon.
> 
> My understanding (and it seems, not only my) from the TPAC conversation was that any new spec work would, at a minimum, not use UUID style device Ids, regardless of double keying, etc.  I don’t know if thats compatible with what you just said above. Can you clarify?
> 
> Its my error / mistake to give "external speakers” as an example in the above.

Oh I see. I agree with the general approach to expose progressively things as they are being used.

>>>> Regarding "network type" identifiers in returned stats (#374)
>>>> ---
>>>> The standard will be updated to to remove this value from the returned dictionary of stats returned (e.g. no type key, and no "wifi", "ethernet", "vpn" value).
>> 
>> The WebRTC WG plans to gather feedback from WebRTC stats users.
> 
> I don’t have any objection to asking more folks for feedback, but if the value privacy violating (and it sure seems to be), I don’t think its appropriate to connect the WG’s decision on the survey response.  E.g. "we’re only going to leave the non-web-compat-relevant privacy-harming value in the spec if its popular" isn’t a good way forward ;)
> 
>>> Re double-keying device ids
>>> ---
>> 
>> The issue is not really that users will be reprompted.
>> Websites will not be able to setup the call configuration as done the last time if last call happened in a different top level origin.
> 
> Sorry, I don’t follow this.  Once the site’s not able to re-set the call up (because the relevant stored value is being written to by two diff site instances), the most recent one will say “I can’t seem to find the device I expect to” and reprompt the user (as if the user had disconnected a web cam or whatever).  Can you point to popular code in the wild that would break / not reprompt? 
> 
>> This creates a compat risk for WebRTC SDK vendors without getting much privacy benefits if other storages are not double keyed.
> 
> We all agree that double keying is better than not double keying I think at this point.  The goal is to also protect the privacy of folks on non-double-keyed browsers.

I am not sure this is a goal of the WG but anyway, two mitigations we are adding will help a lot:
- Most third party iframes are not able to use getUserMedia by default (they require feature policy opt-in using allow attribute for instance). In that case, enumerateDevices is expected to return an empty list.
- device Ids are not provided until user grants getUserMedia

>> If double keying for other storage is not shipped in some browsers, cross-site tracking will continue whatever we do about device IDs.
>> Note also that the WG decided that device IDs will not be accessible until capture permission is granted.
>> This can be shipped much sooner and will make actual usage of device IDs for tracking purposes much harder.
> 
> The web platform has so many privacy harming bugs in it that its going to take a long time to undo all the harm thats been caused.  The relevant question isn’t “will this particular fix stop tracking”, its “does it make things less-bad in a way that doesn’t break too many reasonable use cases”.  (Re break, still trying to understand how realistic this is from the above).
> 
> But point taken about IDs not being available until permission is granted.  Thats better than the status quo (yeah!), but still more privacy-harming that is necessary (boo!)
> 
>>> 4) the current spec text of “device ids should be reset when storage is reset” does not address the privacy concern, especially as some browsers move to dynamic policies for handling JS set storage (e.g. there is no single global “storage clear” event to decide when to reset device Ids against, but lots of small micro, per value decisions)
>> 
>> I am not sure to follow, could you elaborate?
> 
> This is covered in the issue, but the short version is
> 
> 1) some privacy-preserving browsers are moving away from the idea of a "single storage epoch” (I.e. “Im clearing all storage now”) to a large number of overlapping, per value “storage epoch”.  E.x. Safari and Brave clear all JS set cookies 7 days after being set, ITP2.3 does the same thing for all JS set storage, etc etc etc.  These frequent small-events is an intentional alternative to rare “clear everything” interventions like “clear all cookies”.
> 
> 2) It's not clear how the standard should be read in this light.  There is no single “storage is cleared” event to key off.  You either harm the goals of these privacy preserving efforts (since the device id can be preserved / carried over between these mini epochs, see below), or wind up with all sorts of weird web compat issues (if you reset deviceIds anytime any storage value was deleted, you’d have deviceIds being reset on all sorts of odd, unpredictable schedules)

Agreed we might need to clarify this.
My understanding is that, whenever some storage of the given partition is cleared, we clear device IDs.
Yes, we might end up in some compat issues but, if we started to clear some website data, there might be compat issues anyway.

My recommendation to web developers is to use ideal deviceId constraints, so that if the device is not there, or device IDs are reset, the application will still get some audio/video.

> 
> E.g. 
> // Day 0
> window.localStorage.deviceId = deviceId
> // Day 1
> window.localStorage.deviceIdCopy = window.localStorage.deviceId
> // Day 7 
> window.localStorage.deviceId is cleared
> // Day 7 pt 2
> if (!window.localStorage.deviceId) window.localStorage.deviceId = window.localStorage.deviceIdCopy
> // Day 8
> window.localStorage.deviceIdCopy is cleared
> 
> 
> TLDR its not a solution to have a long-lasting privacy harming value, but key it off storage lifetime.
> 
>> I am unclear whether the PING WG would like us to continue digging in this approach given the plan is to mandate double keying.
>> Can you clarify this?
> 
> Order of preference (attempting to summarize PING conversations)
> 
> 1. Use non-uuid device handles AND double key on all platforms
> 2. Use non-uuid device handles
> 3. Double key UUIDs on all platforms
> 4. Double key on some platforms (e.g. platforms that already double key)
> 
> 1 would be great and show the most commitment to privacy. 2 would be great, 3 would be acceptable if non-uuid’s was unworkable.  4 is not a solution.

I think the plan is to update the spec to align with 3.

> 
>> As of why I am not sure how the proposal works, let’s take an example.
>> Say we start with two cameras (back and front) that we give 1 and 2 as device IDs. Everything is fine.
>> Let’s say now that a new camera is being plugged in.
>> We give it device ID 3.
>> Let’s say now that another new camera is being plugged in.
>> We give it device ID 4.
>> 
>> Now, we want persistency so that web pages can say 'I want device 3 at next visit’.
>> This should be working except if user clears website data like cookies for that particular website.
>> But we can no longer clear device IDs for this particular website since this will affect all websites.
>> 
>> Also, if next time, we only have device IDs 1, 2 and 4, this information will be available to all origins that have device info permission.
>> This would leak some interesting cross-site tracking information.
>> We could then decide to use 1, 2 and 3 instead but then the web site is unable to select persistently device 4.
> 
> Sorry im having real trouble following the above, or at least the new privacy harms introduced.  The only change suggested here is that browsers do something equivalent to keeping an internal map of consecutive ints to “uuids", and expose the ints to the site instead.  This is orthogonal to double keying (it makes both the double keying option better AND it makes the non-double-keying option better).  This seems to be most relevant if the WG is dead set against requiring double-keying on _all_ platforms.  

Is the map int->uuid partitioned? What is its lifetime? How is it better in the double-keying option?
Consensus at TPAC is to require deviceIds double-keying in the specification.

> 
> In other words, there is no scenario where moving from UUIDs -> sequential ints reveals _more_ information, or would (its self) effect the life time of any value. 
> 
>> AIUI, the double-keyed proposal solve this issue and is easier to implement by browser engines.
> 
> I’m totally baffled; what is the WG proposing to do to solve the current privacy harm on platforms that don’t double key all storage?  

As said above, some mitigations can be shipped in those platforms sooner than double keying.
The WG is defining a specification in the hope implementers will follow it.
That is why it is important to have all implementers on board, which seems to be the case here.
I am not sure what the WG can/should do in addition to that.

> 
>> FWIW, I like fixed device IDs and am advocating for using some for output speakers.
>> This should in particular work great for devices that are always there for a given device (say loudspeaker/earpiece).
> 
> If you feel im misunderstanding you, or we’re talking past each other, maybe a call would be good.  I’d be happy to join you next time you have a WG call if that’d be helpful.
> 
> Even if we’re disagreeing, thank you for taking the time to continue discussing this :)
> 
> Pete

Received on Tuesday, 1 October 2019 07:45:08 UTC