Re: bohe and delta experimentation...

Everything ends up as binary on either side today. So long as it arrives in
the form that was transmitted, it doesn't matter what the encoding is.
Cookies today are all mostly encoded in something from that binary, with
the most likely being base-64 encoding. Base-64 encoding is highly
compressible (using entropy coding). Encryption makes LZ77 and its ilk not
efficient, but that is a separate thingie entirely :)

-=R


On Wed, Jan 16, 2013 at 2:47 PM, Nico Williams <nico@cryptonector.com>wrote:

> On Wed, Jan 16, 2013 at 4:07 PM, James M Snell <jasnell@gmail.com> wrote:
> > After going a number of scenarios with bohe using a variety of
> > stream-compression scenarios it's painfully obvious that there is really
> no
> > way around the CRIME issue when using stream-compression. So with that,
> I'm
> > turning my attention to the use of Roberto's delta encoding and exploring
> > whether or not binary optimized values can make a significant difference
> (as
> > opposed to simply dropping in huffman-encoded text everywhere).
>
> Well, we could pursue enhanced session continuation (Phillip's term
> for using MACs taken over nonces, among other things, instead of or in
> addition to cookies).
>
> But I think it'd be nice to explore compression of header names and of
> some header values.  So let's,
>
> > I'm starting with dates first...
> >
> >[...]
> >
> > By comparison, I devised a simple binary coding for dates using the
> > following format:
> >
> > +-+---+---+-------------------+
> > |M|TZH|TZM|   year (16-bit)   |
> > +-+---+---+-----+-------------+
> > | month (4-bit) | day (5-bit) |
> > +---------------+-------------+
> > | hour (5-bit)  | minute (6)  |
> > +---------------+-------------+
> > | second (6 bit)| millis (31) |
> > +---------------+-------------+
> > |d|tz hrs (5 bit)| tz min (6) |
> > +-----------------------------+
> >
> > M, TZH and TZM are single bit flags. When M is set, the value includes a
> > 31-bit millisecond field. When TZH is set, it includes timezone offset
> > hours, and when TZM is set, it includes timezone offset minutes. The d
> field
> > (last row) is a single bit indicating positive or negative timezone
> offset.
>
> You don't need 31 bits for milliseconds; 10 will do!  But sure, it's
> nice to be able to get to microseconds, in which case 20 bits should
> suffice, or nanoseconds, in which case 30 bits should suffice.  In no
> case do we need 31 bits for fractions of seconds.  But at best we save
> 21 bits -- two bytes, or, if we're lucky, three.
>
> > The minimum possible binary encoding is 6-bytes, which includes the first
> > three flag bits, year, month, day, hour, minute and second. The maximum
> > possible encoding is 11-bytes which includes full timezone offset and
> > milliseconds. Giving an average encoding of 8-bytes over any sample size
> of
> > randomly generated timestamps.
>
> But if everyone chooses to send the max then it's 11 vs. the 12 you
> got with date string compression.  Too trivial a gain?
>
> Of course, an encoding that uses, say, 44 bits for twos-complement (do
> we need negative dates for this?) seconds since the Unix epoch + 20
> for microseconds would always be 8 bytes, but we'd get no TZ
> information, and TZ info would require at least two more bytes so...
> we're back to about 10-12 bytes.  If we could do with just 34 bits for
> seconds w/o negative dates we're getting closer to always 8 bytes.
> And if we could do with just 33 bits for seconds ... we'd get to
> exactly 8 bytes but at the price of a 2,242 year problem.
>
> What if we use julian day?  Then we'd need 31 bits for days (which
> allows us to go 1000 years into the future), 16 bits for seconds and
> milliseconds, and now we're at 6 bytes + two more for TZ data.  And if
> we encode TZ offset in terms of 15 minute increments then we get down
> to just 7 bytes for the whole thing.  Seven bytes is pretty good, but
> is it good enough to bother with this?
>
> We can do slightly better if we don't allow dates in the past, set a
> new epoch, and limit how far into the future our dates will go (we can
> always allow for encoding far-future dates with many more bytes).  I
> think we can probably get down to 6 bytes for dates, including TZ
> information and milliseconds for the next few decades then go up to 7
> bytes and so on.
>
> > Will be turning my attention to cookie values next. I'm considering
> whether
> > or not we should produce a code-tree that is specific to cookie headers
> > and/or allow for purely binary values.
>
> Where cookies bear encrypted session state you won't be able to
> compress them at all.  And it's not like the server can't do the
> effort to set maximally-compressed cookies -- it should!  IMO: leave
> cookies alone.
>
> Nico
> --
>
>

Received on Wednesday, 16 January 2013 23:13:52 UTC