Re: Some half-baked thoughts about cookies.

Hi Mike,

On Thu, Aug 16, 2018 at 08:27:13AM +0200, Mike West wrote:
> On Tue, Aug 14, 2018 at 2:18 PM Willy Tarreau <w@1wt.eu> wrote:
> 
> > Hi Poul-Henning,
> >
> > On Tue, Aug 14, 2018 at 12:07:21PM +0000, Poul-Henning Kamp wrote:
> > > PS:  64 bits is not enough for everybody, in particularly not when
> > > they are randomly generated by less than perfect implementations.
> > > Make then 128 bit from the start.
> >
> > No, that's what we discussed at the HTTP workshop 3 years ago already,
> > putting too many bits will cause the inverse of what is desired, it
> > adds unique client identifiers making tracking even easier and at the
> > same time will make distributed server stickiness very hard if not
> > impossible.
> 
> 
> Can you point me to notes on this discussion? I'm quite curious!

All I could find was summarized as "upfront routing information" here, as
we didn't take that much notes by then, we were mostly discussing ideas :

    https://github.com/HTTPWorkshop/workshop2015/wiki/HTTP-Ideas

> For clarity, I think this identifier is supposed to make it possible to tie
> multiple HTTP requests together into a coherent session, which (I think?)
> means that a unique-enough identifier is essential.

In fact I'd reformulate this differently. I'm aware about two (valid) use
cases of cookies :

  - put a server identifier so that a request finds its way through an
    infrastructure and keep the same path as the previous requests from
    the same session. That's called "persistence" or "stickiness". Usually
    there are not that many paths, so a few bits are often enough (typically
    16 should be enough for most cases I think). There is no problem with
    collisions since such paths are shared between many sessions already.
    There's always the possibility to do that based on client-fed info only
    (eg: hash on whatever) but then it significantly degrades the ability to
    perform correct load balancing (no more consideration for server load,
    no more graceful shutdown, no more scale-in/scale-out, etc). So here
    it's really desirable to let the load balancer return a path identifier.
    Usually that's done using one or more cookies (typically one per load
    balancing layer) which indicate a server identifier the request was sent
    to.

  - retrieve the user session's context from a database or from memory.
    Here it's different as it's critical from a security perspective that
    there is no collision, as you certainly don't want one user to end up
    on another user's context. The cookies address this in a relatively
    elegant way since they're provided by the server which can decide on
    what is needed to guarantee their unicity. If instead this information
    is passed by the client only you can expect that all those devices with
    low entropy will very often collide and occasionally land on another
    one's session. The raise of IoT and low-power, lightly designed stacks
    further increases this risk. However here I'd say that the server does
    not need to have very strong identifiers, it would only need to figure
    the session among all those it knows, and ensure that these ones cannot
    be brute-forced by clients. Ie if a server supports only 1 million
    concurrent sessions, it could possibly use 20 bits to identify them,
    then seal the value with its own private key so that other valuess cannot
    be injected by clients.

I know I'm simplifying the problem a little bit because it's common nowadays
to have multiple layers of infrastructure with multiple entry points. The
multiplicity of entry points is what causes the problem because you cannot
easily expect that each front point stores and hides the cookies learned
from next levels, so often you need to pass that information to the client
as well.

> > If instead we only place a few bits for routing information
> > (say 16 bits) and place it upfront, all the routing information is
> > present and there is no need to distinguish between multiple clients.
> > The server will then be able to figure the real client from the
> > decrypted traffic (potentially via another client-fed ID if needed).
> >
> 
> Hrm. I might be misunderstanding the use of "decrypted" here, but exposing
> any of the identifiers bits over plaintext is a non-goal of this proposal.

The idea instead was to expose those bits (which are not specific to a client
but to a path taken by several clients) so that load balancers do not even
need to decrypt TLS to find the routing information anymore. That was
discussed as a way to improve server-side performance, and it also happens
to reduce the need for decrypting along the path, which can be an improvement
overall.

Willy

Received on Thursday, 16 August 2018 07:29:30 UTC