W3C home > Mailing lists > Public > ietf-http-wg@w3.org > January to March 2013

Re: bohe and delta experimentation...

From: Nico Williams <nico@cryptonector.com>
Date: Wed, 16 Jan 2013 16:47:33 -0600
Message-ID: <CAK3OfOhWm3XD57aX6oqxB50SO4KUL+b+fY0T6+ndk0G=q4BYbg@mail.gmail.com>
To: James M Snell <jasnell@gmail.com>
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
On Wed, Jan 16, 2013 at 4:07 PM, James M Snell <jasnell@gmail.com> wrote:
> After going a number of scenarios with bohe using a variety of
> stream-compression scenarios it's painfully obvious that there is really no
> way around the CRIME issue when using stream-compression. So with that, I'm
> turning my attention to the use of Roberto's delta encoding and exploring
> whether or not binary optimized values can make a significant difference (as
> opposed to simply dropping in huffman-encoded text everywhere).

Well, we could pursue enhanced session continuation (Phillip's term
for using MACs taken over nonces, among other things, instead of or in
addition to cookies).

But I think it'd be nice to explore compression of header names and of
some header values.  So let's,

> I'm starting with dates first...
> By comparison, I devised a simple binary coding for dates using the
> following format:
> +-+---+---+-------------------+
> |M|TZH|TZM|   year (16-bit)   |
> +-+---+---+-----+-------------+
> | month (4-bit) | day (5-bit) |
> +---------------+-------------+
> | hour (5-bit)  | minute (6)  |
> +---------------+-------------+
> | second (6 bit)| millis (31) |
> +---------------+-------------+
> |d|tz hrs (5 bit)| tz min (6) |
> +-----------------------------+
> M, TZH and TZM are single bit flags. When M is set, the value includes a
> 31-bit millisecond field. When TZH is set, it includes timezone offset
> hours, and when TZM is set, it includes timezone offset minutes. The d field
> (last row) is a single bit indicating positive or negative timezone offset.

You don't need 31 bits for milliseconds; 10 will do!  But sure, it's
nice to be able to get to microseconds, in which case 20 bits should
suffice, or nanoseconds, in which case 30 bits should suffice.  In no
case do we need 31 bits for fractions of seconds.  But at best we save
21 bits -- two bytes, or, if we're lucky, three.

> The minimum possible binary encoding is 6-bytes, which includes the first
> three flag bits, year, month, day, hour, minute and second. The maximum
> possible encoding is 11-bytes which includes full timezone offset and
> milliseconds. Giving an average encoding of 8-bytes over any sample size of
> randomly generated timestamps.

But if everyone chooses to send the max then it's 11 vs. the 12 you
got with date string compression.  Too trivial a gain?

Of course, an encoding that uses, say, 44 bits for twos-complement (do
we need negative dates for this?) seconds since the Unix epoch + 20
for microseconds would always be 8 bytes, but we'd get no TZ
information, and TZ info would require at least two more bytes so...
we're back to about 10-12 bytes.  If we could do with just 34 bits for
seconds w/o negative dates we're getting closer to always 8 bytes.
And if we could do with just 33 bits for seconds ... we'd get to
exactly 8 bytes but at the price of a 2,242 year problem.

What if we use julian day?  Then we'd need 31 bits for days (which
allows us to go 1000 years into the future), 16 bits for seconds and
milliseconds, and now we're at 6 bytes + two more for TZ data.  And if
we encode TZ offset in terms of 15 minute increments then we get down
to just 7 bytes for the whole thing.  Seven bytes is pretty good, but
is it good enough to bother with this?

We can do slightly better if we don't allow dates in the past, set a
new epoch, and limit how far into the future our dates will go (we can
always allow for encoding far-future dates with many more bytes).  I
think we can probably get down to 6 bytes for dates, including TZ
information and milliseconds for the next few decades then go up to 7
bytes and so on.

> Will be turning my attention to cookie values next. I'm considering whether
> or not we should produce a code-tree that is specific to cookie headers
> and/or allow for purely binary values.

Where cookies bear encrypted session state you won't be able to
compress them at all.  And it's not like the server can't do the
effort to set maximally-compressed cookies -- it should!  IMO: leave
cookies alone.

Received on Wednesday, 16 January 2013 22:47:56 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 11:11:09 UTC