[whatwg/storage] Exposing cross-origin resource size (#31)

# Size-exposing attacks

In it's current form, the Storage specification makes it very likely for user agents to develop an implementation that allows attackers to leak the size of opaque Responses. There are at least 3 methods that could be used to do this:

## Estimate usage and quota

The Fetch standard describes a method (`estimate()`) that returns a "rough estimate of the amount of bytes used" and "a conservative estimate of the amount of bytes available". As the terms "rough estimate" and "conservative estimate" are not strictly defined, and security considerations are not mentioned in the standard, all bets are off as to what user agents might come up with (Chrome's current implementation, which is currently still behind a flag, seems to return the exact usage instead of an estimate).
An attack that leaks the exact resource size is straightforward, even when a "rough estimate" would be given:

1. Get estimate
2. Fetch resource
3. Store `Response` in cache
4. Get estimate, and subtract value from (1).

In case the estimate is used to obscure the resource size, repeat steps (3) and/or (4). E.g. if `estimate()` is implemented to round to the nearest kB value, storing the resource 1k times will give you the exact size.

## Per-site quota

Each site has their own fixed quota, and when trying to store something that doesn't fit in storage, this will obviously fail. These features can be abused to leak the response size in the following way:

1. Completely fill up site's storage
2. Free up something like 5MB of storage
3. Fetch resource, and store `Response` in cache
4. Fill up storage byte per byte until this fails
5. Calculate resource size as `5MB - num_fill_bytes`

## Global quota

Similar to the per-site quota, there's also a global quota, and user agents will free up space by first clearing non-persistent boxes. This provides the same properties as above to obtain the exact resource size. An attack looks as follows:

1. Fill up storage on multiple sites in order to trigger eviction of all other non-persistent boxes not under your control
2. Force a single box you know the size of to be evicted
3. In a new origin, fetch resource and store `Response` in cache
4. Fill up storage byte per byte until one of your origins gets evicted
5. Calculate resource size as `size_first_evicted_box - num_fill_bytes`

Compared to the previous attacks, this one is slightly harder to exploit (especially since the global quota can be substantial), but given the high storage speeds (especially with SSD) the attack is still very practical.

***Note:*** *For all browsers that already implement one of the above (i.e. virtually every browser), we managed to devise an attack that exposes the exact size of any resource.*

# Consequences

Being able to determine the resource size of arbitrary `Response` objects poses various privacy and security issues. For example, we found that by knowing the exact size of just 5 resources on `twitter.com` (i.e. `https://twitter.com/following`, `https://twitter.com/followers`, …) it is possible to uniquely identify a user from a large set. In an experiment on 500k user profiles, we found that a user could be uniquely identified in 97.62% of the cases. Of course, since virtually every website is sending state-specific responses, the consequences are not just limited to this example, and are applicable to a large number of web services.

# Mitigation

Having a usable solution that completely eradicates all size-exposing vectors seems unlikely. Instead, I think it's best to have a solution that limits the practicability that of to existing attacks (i.e. timing attacks). As such, I'd like to suggest an approach where "virtual padding" is applied on `Response` objects. More concretely: upon creation of a `Response` object, for instance as the result of a `fetch()` operation, choose a random value `r` between `0` and `rMax`. Next, round up `Response.size + r` to a multiple of `D`. The "virtual padding" is then the rounded up value minus `Response.size`. Note that when a `Response` object is cloned, it should inherit the same padding value.
The padding is virtual in the sense that it is not actually written do disk. Instead, the user agent just uses it as part of its storage bookkeeping, and will also use these values to provide usage/quota estimates.
This method works quite well as it prevents an attacker to quickly obtain many measurements (for each measurement, a `fetch()` operation is required).
For values of `rMax` and `D`, I'd suggest 100kb and 20kb respectively. Even after 50 measurements, these values seem to obfuscate the actual size somewhere in the range of 4.5kb (for 10 measurements, this is approximately 10kb). I made [a little script](https://gist.github.com/tomvangoethem/eaf9312401ab6098282eb7631bff8547) that allows you to play around with the values a bit, but note that most likely a better method can be used to improve the accuracy of attacks, so these values should only be seen as an upper bound.

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/storage/issues/31

Received on Friday, 3 June 2016 00:08:00 UTC