Re: Dictionary Compression for HTTP (at Facebook) from Felix Handte on 2018-09-21 (ietf-http-wg@w3.org from July to September 2018)

From: Felix Handte <felixh@fb.com>
Date: Fri, 21 Sep 2018 21:31:20 +0000
To: Ryan Sleevi <ryan-ietf@sleevi.com>
CC: Mark Nottingham <mnot@mnot.net>, "jyrki@google.com" <jyrki@google.com>, "chaals@yandex-team.ru" <chaals@yandex-team.ru>, "eustas@google.com" <eustas@google.com>, Vlad Krasnov <vlad@cloudflare.com>, Nick Terrell <terrelln@fb.com>, Yann Collet <cyan@fb.com>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Message-ID: <38bd7ae4-c7f1-f547-029c-139b039d222a@fb.com>

Very well, I will attempt to grab the bull by the horns, then. Let's 
talk security.

I guess my first question is this: What is the acceptance criterion for 
proposals in this space with respect to security? From my survey of 
previous conversations on this topic, it has sounded like the bar that 
proposals are being held to is that they are expected not to have any 
vulnerabilities. This is of course a reasonable expectation in general. 
However, compression as it exists in HTTP is well known to have security 
flaws (primarily, BREACH and its extensions). Given that flawed status 
quo, in order to clear that bar, a new proposal would not only have to 
avoid introducing new vulnerabilities, it would have to solve existing ones.

If we are going to make a serious attempt to fix BREACH et al., let's do 
so. Otherwise, let's hold compression work to a practical bar, which is 
to avoid introducing new security issues and to avoid making existing 
ones worse.

If we accept that criterion, my question becomes whether there are known 
issues that would prevent the use of dictionary compression? Many people 
have invoked the idea of security concerns to explain their hesitancy to 
pursue solutions in this space. Despite the frequency with which they're 
brought up, I haven't seen any specific allegations that describe a 
vulnerability introduced by dictionary-based compression. Are there 
known attacks that are made possible or improved by the use of dictionaries?

Obviously the above question is hugely dependent on how dictionaries are 
sourced. Since that's an open question, my sense is that it's probably 
best to look at the narrowest possible scope first and then work our way 
out from there. So I'm particularly curious whether there are known 
issues even when you leave out the challenges of dictionary creation / 
distribution / etc., when you just use statically-defined dictionaries.

In particular, BREACH and friends describe the dangers of mixing private 
data and attacker-controlled data in the same compression window. 
Dictionary-based compression mixes a presumably public dictionary with 
private data. Is that sufficient to enable attacks? Or if you have 
dictionary + private data + attacker data, is that easier to attack than 
in the absence of a dictionary?

I'll follow up with my own impressions of the security concerns and 
possible mitigations soon.

- Felix

On 08/31/2018 07:58 AM, Ryan Sleevi wrote:
> 
> 
> On Fri, Aug 24, 2018 at 6:24 AM Felix Handte <felixh@fb.com 
> <mailto:felixh@fb.com>> wrote:
> 
>     For our own part, we find ourselves drawn towards a solution that
>     makes a lot of the same choices as SDCH. That is, one that treats
>     dictionaries as explicit resources that can be dynamically
>     advertised by an origin, fetched and cached by a client, and then
>     negotiated to be used in requests/responses between the two. The
>     ability to treat a previous, cached response as a base on which to
>     apply a "diff" (negotiated by ETag?) is also attractive to us.
> 
> 
> I would strongly advise against such solutions, as they are a 
> significant part of why SDCH support was removed from browsers.
> 
> I think, to the set of concerns you need to consider in any such 
> solution (which, in my mind, demonstrating the security concerns can be 
> mitigated is paramount of those), you need to define not only the 
> interaction in the 'simple' HTTP sense of Request/Response pairs, but 
> also in the complexity of those interactions as they apply to browsers, 
> for which concerns like same-origin versus cross-origin apply, the 
> re-ordering of requests, and the potential of multiple requests 
> proceeding simultaneously (which H/2 also has to countenance). This also 
> further interacts with models of cache storage and in-memory 
> representation - challenges such as "What happens if a dictionary 
> expires midway during the processing of a response" were fairly fatal, 
> as were the issues around TOCTOU - that is, advertising a dictionary 
> from a request, making a request with said dictionary, and finding it 
> was evicted from the cache prior to the response.
> 
> Models such as the approach by vkrasnov h2-compression-dictionaries are 
> substantially superior in these respects, because it more closely models 
> and defines these interactions, through the association with and scoping 
> to a single H/2 resource.
> 
> It might be that your concern is not the dominant HTTP case of browsers, 
> in which case, it may be fine to ignore these. But I think, from the 
> experiences implementing and maintaining SDCH, models that approximate 
> that space (of resourced dictionaries, advertisements, etc) are likely 
> to be too great an implementation cost, and too great a cognitive cost 
> to the predictability of the platform, to see any meaningful adoption.
> 
> Of course, this is all after the security concerns are mitigated ;)

Received on Friday, 21 September 2018 21:31:55 UTC