W3C home > Mailing lists > Public > ietf-http-wg@w3.org > July to September 2014

Compressing HTTP headers

From: Poul-Henning Kamp <phk@phk.freebsd.dk>
Date: Mon, 07 Jul 2014 12:33:51 +0000
To: ietf-http-wg@w3.org
Message-ID: <58010.1404736431@critter.freebsd.dk>
I have looked at HPACK and I'm pretty certain that we will want to do
better, if not now, then later[1].

As far as I can see, no space has been set aside for versioning HPACK
nor for using entirely different algorithms for compression ?

Obviously, a SETTINGS_ will be needed to tell what is supported
above or beyond HPACK, an IANA registry for same etc. etc.

But there will need to be an indication in each HEADERS frame of
what algorithm/version of compression is used.

I don't want to spend time on SETTINGS and IANA registry at this
point, that can wait till it become srelevant, for now I would be
happy if we can just reserve a field somewhere, demand its value be
zero, and note that it is for future expansion of header compression.

Can we please append a byte to the HEADERS frame for this purpose ?


[1] For background, here are some random notes I took some times back about
    domain/header specific compression:

For instance, both "User-Agent" and "Server" consists of largely well known
keywords, and using tailored dictionaries can save a lot.  There is a risk
here of tailoring *too* much, we don't want to make people think twice about
correctly declaring new version numbers for instance, but mutadis mutandis,
there is a lot to gain.

If nothing else just making 0x80 mean "compatible" would shave 9
bytes of pretty much all User-Agent headers, and with 0x81 meaning
"Mozilla", 0x82 "Win32" etc, it adds up really fast.

Such a "tokenpression" could be applied as s preprocessor before a general
purpose compressor such as HPACK or it could be done by expanding the
vocabulary of HPACKS default dictionary.  (The worries about variable
size attacks does not seem relevant in this specific case).

Likewise, the Date header can be compressed to 4 bytes in a
domainspecific (time_t) way, speeding up processing for sneaky
implemenations at the same time.

Set-Cookie, Last-Modified and various other headers also contain
dates which could be similarly compressed.

A very large percentage of Cookie/Set-Cookie headers can be compressed
by scanning the "value" for characterset and decoding/encoding
well-known-ascii-representations, so that for instance:

	Set-Cookie: foobar="0123456789abcdef"; [...]


	foobar= <HEX> len=8 0x01 0x23 0x45 0x67 0x89 0xab 0xcd 0xef

Likewise "path=/" is so universal that we can make it tacit and only
transmit anything if it is not there or has another path.

These are the "big" ones, but there are other standard headers which show
obvious potential, Content-Type:, Vary:, Content-Length:, Age: and so on.

Some of these compressions would be CPU neutral and some, like dates
could even save CPU in many cases, all have the potential to save
memory and relevant bandwidth.

Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Received on Monday, 7 July 2014 12:34:15 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 30 March 2016 09:57:09 UTC