Re: Some half-baked thoughts about cookies.

On Thu, Aug 16, 2018 at 9:29 AM Willy Tarreau <w@1wt.eu> wrote:

> Hi Mike,
>
> On Thu, Aug 16, 2018 at 08:27:13AM +0200, Mike West wrote:
> > On Tue, Aug 14, 2018 at 2:18 PM Willy Tarreau <w@1wt.eu> wrote:
> >
> > > Hi Poul-Henning,
> > >
> > > On Tue, Aug 14, 2018 at 12:07:21PM +0000, Poul-Henning Kamp wrote:
> > > > PS:  64 bits is not enough for everybody, in particularly not when
> > > > they are randomly generated by less than perfect implementations.
> > > > Make then 128 bit from the start.
> > >
> > > No, that's what we discussed at the HTTP workshop 3 years ago already,
> > > putting too many bits will cause the inverse of what is desired, it
> > > adds unique client identifiers making tracking even easier and at the
> > > same time will make distributed server stickiness very hard if not
> > > impossible.
> >
> >
> > Can you point me to notes on this discussion? I'm quite curious!
>
> All I could find was summarized as "upfront routing information" here, as
> we didn't take that much notes by then, we were mostly discussing ideas :
>
>     https://github.com/HTTPWorkshop/workshop2015/wiki/HTTP-Ideas
>
> > For clarity, I think this identifier is supposed to make it possible to
> tie
> > multiple HTTP requests together into a coherent session, which (I think?)
> > means that a unique-enough identifier is essential.
>
> In fact I'd reformulate this differently. I'm aware about two (valid) use
> cases of cookies :
>
>   - put a server identifier so that a request finds its way through an
>     infrastructure and keep the same path as the previous requests from
>     the same session. That's called "persistence" or "stickiness". Usually
>     there are not that many paths, so a few bits are often enough
> (typically
>     16 should be enough for most cases I think). There is no problem with
>     collisions since such paths are shared between many sessions already.
>     There's always the possibility to do that based on client-fed info only
>     (eg: hash on whatever) but then it significantly degrades the ability
> to
>     perform correct load balancing (no more consideration for server load,
>     no more graceful shutdown, no more scale-in/scale-out, etc). So here
>     it's really desirable to let the load balancer return a path
> identifier.
>     Usually that's done using one or more cookies (typically one per load
>     balancing layer) which indicate a server identifier the request was
> sent
>     to.
>

I agree that there's value to intermediate servers knowing how to route
requests. That said, 16 is a lot of bits, enough to separate ~7 billion
users into chunks of ~100k or so. Below, it sounds like you're suggesting
that these bits would be visible to an intermediate server without
decrypting the connection. That, plus the originating IP address, sounds
like it would entail some fairly substantial risk of pervasive monitoring
similar in kind to the risk we'd like to prevent by locking the token away
behind TLS.

I think we'll need to weigh the value of public load-balancing/stickyness
against this kind of risk in order to determine whether it's a reasonable
thing to include in plaintext.

It seems much less risky to give the server some measure of control over
the high-order bits of the identifier in order to enable internal
load-balancing and stickyness after TLS is terminated. I'm pretty sure we
could find a compromise here between complete client-side control over the
identifier's value, and some number of server-controlled bits in the
identifier.

  - retrieve the user session's context from a database or from memory.
>     Here it's different as it's critical from a security perspective that
>     there is no collision, as you certainly don't want one user to end up
>     on another user's context. The cookies address this in a relatively
>     elegant way since they're provided by the server which can decide on
>     what is needed to guarantee their unicity. If instead this information
>     is passed by the client only you can expect that all those devices with
>     low entropy will very often collide and occasionally land on another
>     one's session. The raise of IoT and low-power, lightly designed stacks
>     further increases this risk. However here I'd say that the server does
>     not need to have very strong identifiers, it would only need to figure
>     the session among all those it knows, and ensure that these ones cannot
>     be brute-forced by clients. Ie if a server supports only 1 million
>     concurrent sessions, it could possibly use 20 bits to identify them,
>     then seal the value with its own private key so that other valuess
> cannot
>     be injected by clients.
>

I think there's a lot of flexibility in the length of the identifier we
provide to servers, if we run with something purely client-controlled. It
seems to me that high-traffic sites would benefit from granular
identifiers, and that if we accept that we need to avoid collisions, then
it would be reasonable to send enough entropy to make them truly unlikely.
I'm confident that some sites could get away with 20 bits and a key. I'm
less confident that larger sites could do the same.

I know I'm simplifying the problem a little bit because it's common nowadays
> to have multiple layers of infrastructure with multiple entry points. The
> multiplicity of entry points is what causes the problem because you cannot
> easily expect that each front point stores and hides the cookies learned
> from next levels, so often you need to pass that information to the client
> as well.
>

Likewise, I'm sure I'm oversimplifying the problem from your perspective,
as I'm looking at this with my browser hat on. I hope we can find a
reasonable set of compromises over time. :)


>
> > > If instead we only place a few bits for routing information
> > > (say 16 bits) and place it upfront, all the routing information is
> > > present and there is no need to distinguish between multiple clients.
> > > The server will then be able to figure the real client from the
> > > decrypted traffic (potentially via another client-fed ID if needed).
> > >
> >
> > Hrm. I might be misunderstanding the use of "decrypted" here, but
> exposing
> > any of the identifiers bits over plaintext is a non-goal of this
> proposal.
>
> The idea instead was to expose those bits (which are not specific to a
> client
> but to a path taken by several clients) so that load balancers do not even
> need to decrypt TLS to find the routing information anymore. That was
> discussed as a way to improve server-side performance, and it also happens
> to reduce the need for decrypting along the path, which can be an
> improvement
> overall.
>
> Willy
>

Received on Thursday, 16 August 2018 09:23:33 UTC