Re: Introducing a Session header... from Willy Tarreau on 2012-07-20 (ietf-http-wg@w3.org from July to September 2012)

From: Willy Tarreau <w@1wt.eu>
Date: Fri, 20 Jul 2012 19:53:32 +0200
To: Amos Jeffries <squid3@treenet.co.nz>
Cc: ietf-http-wg@w3.org
Message-ID: <20120720175332.GJ24195@1wt.eu>
On Fri, Jul 20, 2012 at 08:20:33PM +1200, Amos Jeffries wrote:
> Why can't server identifiers be associated with learnt IDs? What is 
> learning apart from the LB playing server with its session-state 
> containing a pointer at another server which does the donkeys work 
> (using the same session-ID, OR a new one from the LB generated for the 
> unique client).

Because there are multiple paths. Learning is only usable in small
deployments where you only have one LB (or say a couple of with
fast synchronization). When you have LBs in multiple datacenters,
the traffic comes to any of them and you absolutely need any of
these LBs to find the proper server even if it's located on the
other DC. The most common case for this is when you rely on BGP
with multi-hosting. Most of the clients stay on the same site,
but a small part of them switch to another POP once in a while.
The inter-DC traffic is very low but exists and you absolutely
want this to work. And obviously you cannot enable multi-DC sync
at high connection rates, it does not scale at all.

A similar case exists with multiple portals relying on common
backends.

That's why it is absolutely mandatory to keep the ability for a
LB (say HTTP router) to insert routing information and to match
them. Otherwise I already predict what will happen, exactly the
same as with the original Set-Cookie header : users will find
tricks to be able to send a second and third ID and then the
protocol will again be a real mess.

> >If we have a variable-sized ID that intermediaries can fiddle with,
> >then it's harder to put that in the protocol and we risk seeing it
> >degraded as cookies to put whatever in them.
> >
> >Maybe we need a specific part for routing information, but I'm really
> >not convinced this is needed. I think in fact that the client-generated
> >ID should be fix-sized and always present, and that the server-generated
> >part could be variable with a limited size so that it is not abused
> >anymore.
> 
> To prevent mis-use the way cookies were I agree.
> 
> Do we have any data on how many systems are abusing session-cookie IDs 
> and the equivalent to encode data over the wire?

Well I can say that I had to support variable buffer size in haproxy to
enable some people to send requests as large as 32 kB due to 2-3 cookies,
some up to 8kB (the max that they managed to pass through Apache, making
them realize they might have been wrong).

> Given that we are not burning Cookie at the gateway (using it for data 
> transfer is easier), what is the actual risk here?

The risk is to continue to send too many useless data, that's all. Better
be careful this time.

> Where I stop in this rabbit hole is at client determination of what to 
> do when its got state and the servers its talking too keep changing. I'm 
> hoping the browser people might be able to shed a bit of light that we 
> can continue a bit further there.

Might help, yes.

> >Only when the session-id is able to convey as much information as is
> >needed to route the request to its target. Right now cookies are used
> >*a lot* for this. I know some places where there are up to 6 layers
> >of server farms with multiple possible paths between them (think many
> >portals for similar access to same information), and the only way to
> >know where to pass through is to use the information inserted by LBs
> >into cookies. To completely get rid of cookies we need to be able to
> >store this routing information somewhere else (I would love this, so
> >that we don't have to deal with this horrible cookie header anymore).
> >But this is not a small issue to deal with.
> 
> That is our discussions on where to take network-friendly request-ID 
> field come in. Being separate from end-to-end session it provides enough 
> hop-by-hop state and multiple request linkages to form a stateful flow 
> out of a hierarchy with stateless input.

I'm not sure to get what you mean.

> >>Cookies as used for data transfer is a different problem entirely and
> >>should not be tackled by the creation of a session state semantics.
> >Cookies are even more generic, they're just a copy of opaque data the
> >server needs when the client gets back. Originally it was just a way
> >to find the pointer to the session (which is routing information as
> >well after all). But they were completely abused to carry anything
> >everywhere and the first issue is that JS has access to them.
> 
> Aye. But to kill them we have to pull out their utility and provide 
> easier tools for their users to work with. Killing them with efficiency 
> as it were. One slice at a time.

Agreed. I'm not speaking about addressing all issues at once, just to see
if we can define what a session is an how all the involved parties might
benefit from it, at least by limiting/avoiding the need to know the cookie
header field. Hopefully we could reach a situation where we can completely
get rid of the cookie header (eg: simply have a list of (VAR,VALUE) couples
for a session). But it's too early.

Regards,
Willy
Received on Friday, 20 July 2012 17:54:05 UTC