Re: Dictionary Compression for HTTP (at Facebook) from Patrick McManus on 2018-09-21 (ietf-http-wg@w3.org from July to September 2018)

From: Patrick McManus <mcmanus@ducksong.com>
Date: Fri, 21 Sep 2018 18:55:26 -0400
To: Felix Handte <felixh@fb.com>
Cc: Ryan Sleevi <ryan-ietf@sleevi.com>, Mark Nottingham <mnot@mnot.net>, "jyrki@google.com" <jyrki@google.com>, "chaals@yandex-team.ru" <chaals@yandex-team.ru>, "eustas@google.com" <eustas@google.com>, Vlad Krasnov <vlad@cloudflare.com>, Nick Terrell <terrelln@fb.com>, Yann Collet <cyan@fb.com>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Message-ID: <CAOdDvNqU8SGoguH=+j1HqepSqKbK+JnNZ6dN8SKaju=ENimXrg@mail.gmail.com>

Hi Felix,

On Fri, Sep 21, 2018 at 5:31 PM, Felix Handte <felixh@fb.com> wrote:

> Very well, I will attempt to grab the bull by the horns, then. Let's
> talk security.
>
> I guess my first question is this: What is the acceptance criterion for
> proposals in this space with respect to security? From my survey of
>

you are not going to be able to pre-negotiate working group acceptance
criteria. The criteria is what it always is - rough consensus on a draft
from the working group and the approval of the IESG.

But to help with more background, the past concern that has been there
hasn't been sufficient/proactive analysis of the various proposals - and
given that the mixture of compression of encryption is known to be a
problem (as you mention) a bar of "no known problems" hasn't been enough to
get anywhere near rough consensus. I believe people wanted to see a
proactive analysis of what the concerns of a particular proposal are. At
that point we can debate whether they are reasonable or not for their
anticipated gains.

make sense? You're certainly going in a reasonable direction considering
the interactions of dictionaries, what attackers control, and the ways in
which public and private data are mixed. Of course confidentiality can
apply to 'public data' as well and its not clear how/if folks would want to
handle that.


> previous conversations on this topic, it has sounded like the bar that
> proposals are being held to is that they are expected not to have any
> vulnerabilities. This is of course a reasonable expectation in general.
> However, compression as it exists in HTTP is well known to have security
> flaws (primarily, BREACH and its extensions). Given that flawed status
> quo, in order to clear that bar, a new proposal would not only have to
> avoid introducing new vulnerabilities, it would have to solve existing
> ones.
>
> If we are going to make a serious attempt to fix BREACH et al., let's do
> so. Otherwise, let's hold compression work to a practical bar, which is
> to avoid introducing new security issues and to avoid making existing
> ones worse.
>
> If we accept that criterion, my question becomes whether there are known
> issues that would prevent the use of dictionary compression? Many people
> have invoked the idea of security concerns to explain their hesitancy to
> pursue solutions in this space. Despite the frequency with which they're
> brought up, I haven't seen any specific allegations that describe a
> vulnerability introduced by dictionary-based compression. Are there
> known attacks that are made possible or improved by the use of
> dictionaries?
>
> Obviously the above question is hugely dependent on how dictionaries are
> sourced. Since that's an open question, my sense is that it's probably
> best to look at the narrowest possible scope first and then work our way
> out from there. So I'm particularly curious whether there are known
> issues even when you leave out the challenges of dictionary creation /
> distribution / etc., when you just use statically-defined dictionaries.
>
> In particular, BREACH and friends describe the dangers of mixing private
> data and attacker-controlled data in the same compression window.
> Dictionary-based compression mixes a presumably public dictionary with
> private data. Is that sufficient to enable attacks? Or if you have
> dictionary + private data + attacker data, is that easier to attack than
> in the absence of a dictionary?
>
> I'll follow up with my own impressions of the security concerns and
> possible mitigations soon.
>
> - Felix
>
> On 08/31/2018 07:58 AM, Ryan Sleevi wrote:
> >
> >
> > On Fri, Aug 24, 2018 at 6:24 AM Felix Handte <felixh@fb.com
> > <mailto:felixh@fb.com>> wrote:
> >
> >     For our own part, we find ourselves drawn towards a solution that
> >     makes a lot of the same choices as SDCH. That is, one that treats
> >     dictionaries as explicit resources that can be dynamically
> >     advertised by an origin, fetched and cached by a client, and then
> >     negotiated to be used in requests/responses between the two. The
> >     ability to treat a previous, cached response as a base on which to
> >     apply a "diff" (negotiated by ETag?) is also attractive to us.
> >
> >
> > I would strongly advise against such solutions, as they are a
> > significant part of why SDCH support was removed from browsers.
> >
> > I think, to the set of concerns you need to consider in any such
> > solution (which, in my mind, demonstrating the security concerns can be
> > mitigated is paramount of those), you need to define not only the
> > interaction in the 'simple' HTTP sense of Request/Response pairs, but
> > also in the complexity of those interactions as they apply to browsers,
> > for which concerns like same-origin versus cross-origin apply, the
> > re-ordering of requests, and the potential of multiple requests
> > proceeding simultaneously (which H/2 also has to countenance). This also
> > further interacts with models of cache storage and in-memory
> > representation - challenges such as "What happens if a dictionary
> > expires midway during the processing of a response" were fairly fatal,
> > as were the issues around TOCTOU - that is, advertising a dictionary
> > from a request, making a request with said dictionary, and finding it
> > was evicted from the cache prior to the response.
> >
> > Models such as the approach by vkrasnov h2-compression-dictionaries are
> > substantially superior in these respects, because it more closely models
> > and defines these interactions, through the association with and scoping
> > to a single H/2 resource.
> >
> > It might be that your concern is not the dominant HTTP case of browsers,
> > in which case, it may be fine to ignore these. But I think, from the
> > experiences implementing and maintaining SDCH, models that approximate
> > that space (of resourced dictionaries, advertisements, etc) are likely
> > to be too great an implementation cost, and too great a cognitive cost
> > to the predictability of the platform, to see any meaningful adoption.
> >
> > Of course, this is all after the security concerns are mitigated ;)
>

Received on Friday, 21 September 2018 22:55:56 UTC