Re: bohe and delta experimentation...

On Wed, Jan 16, 2013 at 2:47 PM, Nico Williams <nico@cryptonector.com>wrote:

> > [snip]
> > +-+---+---+-------------------+
> > |M|TZH|TZM|   year (16-bit)   |
> > +-+---+---+-----+-------------+
> > | month (4-bit) | day (5-bit) |
> > +---------------+-------------+
> > | hour (5-bit)  | minute (6)  |
> > +---------------+-------------+
> > | second (6 bit)| millis (31) |
> > +---------------+-------------+
> > |d|tz hrs (5 bit)| tz min (6) |
> > +-----------------------------+
> >
> > M, TZH and TZM are single bit flags. When M is set, the value includes a
> > 31-bit millisecond field. When TZH is set, it includes timezone offset
> > hours, and when TZM is set, it includes timezone offset minutes. The d
> field
> > (last row) is a single bit indicating positive or negative timezone
> offset.
>
> You don't need 31 bits for milliseconds; 10 will do!  But sure, it's
> nice to be able to get to microseconds, in which case 20 bits should
> suffice, or nanoseconds, in which case 30 bits should suffice.  In no
> case do we need 31 bits for fractions of seconds.  But at best we save
> 21 bits -- two bytes, or, if we're lucky, three.
>
>
Yes, 31 bits was intentional overkill just for the strawman. I'm generally
unconvinced that we would need anything more than millisecond precision,
allowing us to drop to a max of 9-bytes.


> > The minimum possible binary encoding is 6-bytes, which includes the first
> > three flag bits, year, month, day, hour, minute and second. The maximum
> > possible encoding is 11-bytes which includes full timezone offset and
> > milliseconds. Giving an average encoding of 8-bytes over any sample size
> of
> > randomly generated timestamps.
>
> But if everyone chooses to send the max then it's 11 vs. the 12 you
> got with date string compression.  Too trivial a gain?
>
> Of course, an encoding that uses, say, 44 bits for twos-complement (do
> we need negative dates for this?) seconds since the Unix epoch + 20
> for microseconds would always be 8 bytes, but we'd get no TZ
> information, and TZ info would require at least two more bytes so...
> we're back to about 10-12 bytes.  If we could do with just 34 bits for
> seconds w/o negative dates we're getting closer to always 8 bytes.
> And if we could do with just 33 bits for seconds ... we'd get to
> exactly 8 bytes but at the price of a 2,242 year problem.
>
>
One of the nice thing about the strawman encoding I used is that it is a
field-for-field representation of the RFC3339 timestamp. It encodes exactly
the same information and can represent the full range of dates supported by
the date-time construct. Other variations may shave off one or two
additional bytes but either lose information or are far more limited in the
values they can express. Suppose we decided to adjust the millisecond field
to 10 bits as you suggest we have a worse case of 9-bytes, best case of 6.
Seems like a reasonable compromise to me.


> What if we use julian day?  Then we'd need 31 bits for days (which
> allows us to go 1000 years into the future), 16 bits for seconds and
> milliseconds, and now we're at 6 bytes + two more for TZ data.  And if
> we encode TZ offset in terms of 15 minute increments then we get down
> to just 7 bytes for the whole thing.  Seven bytes is pretty good, but
> is it good enough to bother with this?
>
> We can do slightly better if we don't allow dates in the past, set a
> new epoch, and limit how far into the future our dates will go (we can
> always allow for encoding far-future dates with many more bytes).  I
> think we can probably get down to 6 bytes for dates, including TZ
> information and milliseconds for the next few decades then go up to 7
> bytes and so on.
>
> > Will be turning my attention to cookie values next. I'm considering
> whether
> > or not we should produce a code-tree that is specific to cookie headers
> > and/or allow for purely binary values.
>
> Where cookies bear encrypted session state you won't be able to
> compress them at all.  And it's not like the server can't do the
> effort to set maximally-compressed cookies -- it should!  IMO: leave
> cookies alone.
>

Yeah, that's what I suspect also. Allowing for binary cookie values can
allow us to avoid extra bits on the wire but compression here typically
doesn't help for these at all, regardless of how optimized our code tree
is.


>
> Nico
> --
>

Received on Wednesday, 16 January 2013 23:11:43 UTC