Re: [ambient-light] RFC: editorial: Add reading quantization and threshold check algorithms. (#77)

(apologies in advance for the wall of text ahead)

Hi, @sandandsnow and @lknik.

Thank you very much for all the time spent reviewing this PR (and special thanks to @lknik for being around and watching this API for years now). My apologies for the time it took me to get back to this change. At least I did spend some time working and documenting the Generic Sensor implementation in Chromium and have a better understanding of the mitigations I am trying to "upstream" here.

I've updated this PR as well as w3c/sensors#429 to address some of the feedback received here as well as to make the prose and algorithms better match what we have in Chromium. I strongly suggest looking at w3c/sensors#429 first and then reading this PR's diff. I'll go over the current solution and then try to address the concerns @sandandsnow has brought from PING.

## Current change

Compared to the previous version from the end of 2021:
* The `illuminance` getter here no longer does anything special. Instead, I've moved the (optional) rounding of values to the newly-added "reading quantization algorithm" called by ["get value from latest reading"](https://w3c.github.io/sensors/#get-value-from-latest-reading) in the main Generic Sensor spec. Specifications such as this one then define a "reading quantization algorithm" to perform the actual rounding, but the mechanism is generic enough that other specs (such as Accelerometer) could do something similar without having to do everything from scratch.
* The value of the "illuminance threshold value" has changed from 50lx to "half of the illuminance rounding multiple" (25lx by default then). This matches the current Chromium implementation, although I am not sure this is the best option (more on that below).
* I've added informative references to works analyzing potential attacks using Ambient Light Sensors, including one of @lknik's blog posts, and how the proposed changed help mitigate some of the attacks. I've also mentioned issue #13 where the current proposed rounding value of 50lx was first derived, and I've linked to the spreadsheet with the measurements made back in 2017 that we've been using as a basis for the decisions (in the future we could follow @anssiko's idea and add the table to this repository).
* I've also followed https://w3c.github.io/fingerprinting-guidance/#mark-fingerprinting and added a `tracking-vector` mark to the security and privacy section (not sure if it should be added to a different location instead though).

## Things I'd like to discuss

- [ ] I am not entirely sure about https://arxiv.org/abs/1405.3760 because the variation in lux in figures 3 and 5 differs quite a lot. The variation in figure 5 matches my informal tests using my own phone more closely, and the mitigations suggested here would address them more effectively than the variation in figure 3.
- [ ] The "threshold check algorithm" idea
    - [ ] I am still not 100% sure the whole idea behind the it counts as a fingerprinting/data leak mitigation or if it's just a sort of high-pass filter for changes in illuminance. Some operating systems and hardware platforms even take care of this kind of check themselves, where new readings are made available only if they differ from the latest one by some percentage or absolute value. In Chromium, the idea was to avoid reporting too many changes when illuminance values hovered around the "edges" between rounding multiples, so that moving from 24lx to 26lx and back to 24lx wouldn't generate 3 "reading" events and update the readings, for example, and let an attacker know that the actual value is probably in that region. I can see how this helps avoid leaking data, but is it effective enough to count as a mitigation?
    - [ ] The actual "illuminance threshold value" suggested here (25lx) came up during code review years ago, but I don't think there's a strong motivation for it to be 25lx and not the same as the "illuminance rounding multiple" (50lx). Using 25lx means the API might end up reporting a new reading even if the rounded value has not changed. For example, going from 26lx to 53lx in raw readings would generate a "reading" event even though the `illuminance` attribute would return 50 in both cases. Cases like this could allow an attacker to know that the actual illuminance value is in the [25,75) half-open interval and that the two readings differ by at least 25lx and at most 49lx. If we the "illuminance rounding multiple", this case would not exist.

## PING's concerns

> Thank you. We discussed the proposed mitigations in the PING call today. As a result of that conversation we have a couple of follow-up questions:
>
> * Could you clarify for us why the WG chose a 50lx threshold?

Done in the spec and also above, hopefully.

> [other questions]

Please correct me if I'm wrong, but I'm under the impression that some of those concerns came up by looking at this spec in isolation without looking at the main Generic Sensor spec. https://w3c.github.io/sensors/#concepts-can-expose-sensor-readings and https://w3c.github.io/sensors/#abstract-operations mandate, for example, that:
- Sensor readings are only exposed under the following conditions:
    + The environment is a secure context.
    + Sensor usage is allowed via the Permissions Policy API.
    + The document's visibility is "visible".
    + The currently focused area belongs to a document whose origin is same origin-domain with document’s origin.
    + The call(s) to [request permission to use](https://w3c.github.io/permissions/#dfn-request-permission-to-use) in `Sensor.start()` have passed.

It is up to each UA to implement "request permission to use", and it might involve prompting users, for example. At the moment, Chromium does not prompt users for access to motion sensors (e.g. accelerometer and gyroscope) but lets them allow or block access by default. We are also working on making this better by moving to prompting by default (and removing the "allow by default" option) as part of the working on implementing Device Orientation's `requestPermission()` method. Additionally, in https://www.w3.org/2021/10/29-dap-minutes.html#t07 we also decided to also add a camera permission requirement to the spec to make the permission requirements stricter (I still have to address that one).

With the above in mind, let me try to get to the specific questions:

> * Also, notwithstanding the mitigations, is there still a fingerprinting risk (albeit a reduced risk)? More specifically, to what extent does reducing to a 50lx threshold (or any threshold) prevent fingerprinting on the basis of opening up the capacity to track a user through typical behavior patterns?

I believe the fingerprinting risk remains. Even though we reduce the granularity of the data exposed to API users, an attacker could still know that a user is e.g. at an office environment between certain hours (320-500lx per https://en.wikipedia.org/wiki/Lux#Illuminance), and walks under full daylight at a certain time of the day (1000 to 10000lx). The mitigations listed above help prevent that websites (including third-parties) have undetected and unprompted access to the data.

> And, a more general privacy question (not related to reducing granularity), how does the specification prevent or protect against cross-device tracking (e.g. the light equivalent of ultrasonic beacons)?

The idea with the set of mitigations proposed here and in the Generic Sensor spec is to make the readings coarse enough to help prevent cross-device tracking while at the same time only making readings available to pages that fulfill the requirements above and which the user has authorized to gather data.

> More specifically, we have received these observations and comments:
> 
> * Even bucketing by 50lux still seems to expose a lot of fingerprinting surface (>=4bits given the range here), which doesn’t seem acceptable

Do the mitigations above help make it more acceptable? I'm asking because this is also the case even for specs such as https://w3c.github.io/deviceorientation that are implemented by multiple engines: a `DeviceMotionEvent` can include the output of two 3-axis accelerometers, a gyroscope and a double (`interval`), and isn't that a lot more bits? I'm not asking this rhetorically, as I'm basing my calculations on https://www.eff.org/deeplinks/2010/01/primer-information-theory-and-privacy and am not sure if this is right.

> * Bucketing doesn’t seem to address the “ephemeral fingerprinting” concern

Are you referring to https://github.com/asankah/ephemeral-fingerprinting or is there another resource I could look at? That page lists several possible mitigations and we implement many of them, so I'm wondering if the Generic Sensor + ALS mitigations do address the concern at least partially?

> * This API seems like an extremely infrequently needed feature (as evidenced by most browsers not being interested in implementing); so, why not put it behind a permission prompt?
> * This seems to be easily exploitable as a covert channel (write to the channel by changing the brightness of the content on the page, read from the channel through the brightness sensor). The spec needs to address this (e.g. through permission prompt)

Answered above: the permission side of things is handled in the main Generic Sensor spec, UAs are free to handle the permission request implementation, we want to add a prompt to the Chromium implementation. Additionally, when it comes to the covert channel attack, the bucketing idea also helps make it more difficult -- the idea looks similar to https://arturjanc.com/ls/ after all, which the bucketing idea is supposed to help address.

-- 
GitHub Notification of comment by rakuco
Please view or discuss this issue at https://github.com/w3c/ambient-light/pull/77#issuecomment-1145912333 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Friday, 3 June 2022 12:27:24 UTC