- From: Willy Tarreau <w@1wt.eu>
- Date: Thu, 16 Aug 2018 09:28:59 +0200
- To: Mike West <mkwst@google.com>
- Cc: phk@phk.freebsd.dk, HTTP Working Group <ietf-http-wg@w3.org>
Hi Mike, On Thu, Aug 16, 2018 at 08:27:13AM +0200, Mike West wrote: > On Tue, Aug 14, 2018 at 2:18 PM Willy Tarreau <w@1wt.eu> wrote: > > > Hi Poul-Henning, > > > > On Tue, Aug 14, 2018 at 12:07:21PM +0000, Poul-Henning Kamp wrote: > > > PS: 64 bits is not enough for everybody, in particularly not when > > > they are randomly generated by less than perfect implementations. > > > Make then 128 bit from the start. > > > > No, that's what we discussed at the HTTP workshop 3 years ago already, > > putting too many bits will cause the inverse of what is desired, it > > adds unique client identifiers making tracking even easier and at the > > same time will make distributed server stickiness very hard if not > > impossible. > > > Can you point me to notes on this discussion? I'm quite curious! All I could find was summarized as "upfront routing information" here, as we didn't take that much notes by then, we were mostly discussing ideas : https://github.com/HTTPWorkshop/workshop2015/wiki/HTTP-Ideas > For clarity, I think this identifier is supposed to make it possible to tie > multiple HTTP requests together into a coherent session, which (I think?) > means that a unique-enough identifier is essential. In fact I'd reformulate this differently. I'm aware about two (valid) use cases of cookies : - put a server identifier so that a request finds its way through an infrastructure and keep the same path as the previous requests from the same session. That's called "persistence" or "stickiness". Usually there are not that many paths, so a few bits are often enough (typically 16 should be enough for most cases I think). There is no problem with collisions since such paths are shared between many sessions already. There's always the possibility to do that based on client-fed info only (eg: hash on whatever) but then it significantly degrades the ability to perform correct load balancing (no more consideration for server load, no more graceful shutdown, no more scale-in/scale-out, etc). So here it's really desirable to let the load balancer return a path identifier. Usually that's done using one or more cookies (typically one per load balancing layer) which indicate a server identifier the request was sent to. - retrieve the user session's context from a database or from memory. Here it's different as it's critical from a security perspective that there is no collision, as you certainly don't want one user to end up on another user's context. The cookies address this in a relatively elegant way since they're provided by the server which can decide on what is needed to guarantee their unicity. If instead this information is passed by the client only you can expect that all those devices with low entropy will very often collide and occasionally land on another one's session. The raise of IoT and low-power, lightly designed stacks further increases this risk. However here I'd say that the server does not need to have very strong identifiers, it would only need to figure the session among all those it knows, and ensure that these ones cannot be brute-forced by clients. Ie if a server supports only 1 million concurrent sessions, it could possibly use 20 bits to identify them, then seal the value with its own private key so that other valuess cannot be injected by clients. I know I'm simplifying the problem a little bit because it's common nowadays to have multiple layers of infrastructure with multiple entry points. The multiplicity of entry points is what causes the problem because you cannot easily expect that each front point stores and hides the cookies learned from next levels, so often you need to pass that information to the client as well. > > If instead we only place a few bits for routing information > > (say 16 bits) and place it upfront, all the routing information is > > present and there is no need to distinguish between multiple clients. > > The server will then be able to figure the real client from the > > decrypted traffic (potentially via another client-fed ID if needed). > > > > Hrm. I might be misunderstanding the use of "decrypted" here, but exposing > any of the identifiers bits over plaintext is a non-goal of this proposal. The idea instead was to expose those bits (which are not specific to a client but to a path taken by several clients) so that load balancers do not even need to decrypt TLS to find the routing information anymore. That was discussed as a way to improve server-side performance, and it also happens to reduce the need for decrypting along the path, which can be an improvement overall. Willy
Received on Thursday, 16 August 2018 07:29:30 UTC