Re: #578: Header Table and Static Table Indices Switched from Greg Wilkins on 2014-09-30 (ietf-http-wg@w3.org from July to September 2014)

From: Greg Wilkins <gregw@intalio.com>
Date: Wed, 1 Oct 2014 09:31:15 +1000
To: Jeff Pinner <jpinner@twitter.com>
Cc: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>, Willy Tarreau <w@1wt.eu>, Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAH_y2NHndZ1bKg0NV=F6dKijkCZYe-mnzeYnihXVgXnN-TNgiA@mail.gmail.com>

Jeff,

in the examples I gave I'm using a mix of indexed and non-indexed fields.

For some fields, like date, it may be that we are sending them < once per
second to a large number of connections.  Indexing does not help.

For fields like Server, they still need to be sent initially on new
connections, so we pregenerate that version of them.

Yes we could just pre-generate only the field value, but our pregeneration
framework has to work for h1,spdy and h2.   It is setup to pregenerate
entire fields for code that does not know what protocols are being
spoken.    To break this down to separate header name and header value
pregeneration would double the handling we have to do for h1.   It is not
an option to have protocol specific code.

When implementing h2-12, we looked at pregeneration and we were simply not
able to do it without breaking h1.    With h2-14 we have been able to add
it easily for significant benefit and we are increasing the amount of
pregeneration we do in various situations as a result.   This is not a
theoretical efficiency but a real world deployed working code saving.

regards

On 1 October 2014 08:55, Jeff Pinner <jpinner@twitter.com> wrote:

> > Then headers with static names and custom values don't need to be
> indexed.
> > For example after this change Jetty generates a date field only once per
> > second with a static date name index and a huffman encoded date value.
> > This precomputed field is then used 10's or 100s of thousands of times in
> > the next second.   Previously we had to recalculate that field for each
> and
> > every connection because the date name index was different for every
> > connection.
>
> This isn't a compelling argument to me.
>
> Without indexing, you can still re-use the huffman encoded data value
> regardless of where the static table is.
>
> And if you do index, to say share this value between responses on the
> same HTTP/2 connection, then you have to lookup the offset even if the
> static table comes first.
>
> >
> > Similarly for the server header, we pre-compute and can send to every new
> > connection because we know what the server name index will be.
>
> Again here, you can none of the benefit of sharing the value amongst
> multiple responses on the same connection. You are trading off
> compression efficiency for not having to do a index lookup.
>
> >
> >  Static content in the content cache can also pregenerate last-modified
> and
> > etag headers which again will get used on lots of different connections
> and
> > don't need to be regenerated.
> >
> > Previously we had pre-generated headers for http1, but were not able to
> add
> > the mechanism for h2 because every field was always custom generated for
> > each connection.   After this change was made we were able to generalise
> > pre-generation for both h1 and h2 and this is a significant saving in
> > scalable servers.  EVEN IF the cost was larger headers they would be
> > worthwhile, but the average header appears to be the same or a little
> > smaller.
> >
>
> So the TL;DR here seems to be you're not using indexing.
>
> My response is to use indexing, it has low computational overhead and
> you get much higher compression efficiency reusing these values.
>

-- 
Greg Wilkins <gregw@intalio.com>
http://eclipse.org/jetty HTTP, SPDY, Websocket server and client that scales
http://www.webtide.com  advice and support for jetty and cometd.

Received on Tuesday, 30 September 2014 23:31:43 UTC