RE: bohe and delta experimentation...

I'll try a shot at the URLs. Experimental data show that URLs often share the same beginning: for requests targeting a web sites, the URLs will usually start with the same scheme and host and possibly port. The beginning of the path is also usually shared by several URLs.

Therefore an efficient encoding for an URL is as a delta from a previous URL: the number of shared characters at the beginning, and the new characters. To reduce the state that need to be stored, it is possible to use only the previous URL as a reference.

Regards,

Hervé.

> -----Original Message-----
> From: Mark Nottingham [mailto:mnot@mnot.net]
> Sent: vendredi 18 janvier 2013 08:09
> To: Martin J. Dürst
> Cc: Roberto Peon; Nico Williams; James M Snell; ietf-http-wg@w3.org
> Subject: Re: bohe and delta experimentation...
> 
> I feel like we're starting to focus a bit too closely on dates here (not just you,
> Martin!).
> 
> Let's look at the bigger picture, and other headers, before getting too deep
> here; we're talking about saving a handful of bytes at this point, and we
> haven't yet looked at URLs, etc.
> 
> Cheers,
> 
> 
> On 18/01/2013, at 6:05 PM, Martin J. Dürst <duerst@it.aoyama.ac.jp> wrote:
> 
> > On 2013/01/17 8:49, Roberto Peon wrote:
> >> Er, by which I mean that dates can be relative to the time stamped by
> >> something and kept for the connection duration. That would reduce the
> >> number of bits needed by a fair margin, assuming that is desirable.
> >> -=R
> >
> > I was thinking about something similar, but on a bigger scale. If we have an
> encoding that can cover about 80 years (this is a simplification from Unix time
> does, which is 1970-2037 with 31 bits), then if we assume every server
> around the globe understands that we are currently somewhere between
> 2010 and 2020, we could just use that as a very rough base point. In that case,
> we can't use a strict offset, because that would make dates move around
> every time we move to a new decade. But what we can do is to just rotate
> around. For this rotation to work, we have to leave some empty space.
> Below is a very very rough table of how something like this could work.
> >
> > Assume we have three bits in a prefix to label 8 different decades. Then in
> each decade as indicated below on the left side, the prefixes would be used
> with the meaning as indicated at the top of the table.
> >
> >           1970 1980 1990 2000 2010 2020 2030 2040 2050 2060 2070 2080
> >            -    -    -    -    -    -    -    -    -    -    -    -
> >           1980 1990 2000 2010 2020 2030 2040 2050 2060 2070 2080 2090
> >
> > 1970-1980    0    1    2    3    x    x    x    x    x    x    x    x
> > 1980-1990    0    1    2    3    4    x    x    x    x    x    x    x
> > 1990-2000    0    1    2    3    4    5    x    x    x    x    x    x
> > 2000-2010    x    1    2    3    4    5    6    x    x    x    x    x
> > 2010-2020    x    x    2    3    4    5    6    7    x    x    x    x
> > 2020-2030    x    x    x    3    4    5    6    7    0    x    x    x
> > 2030-3040    x    x    x    x    4    5    6    7    0    1    x    x
> > 2040-2050    x    x    x    x    x    5    6    7    0    1    2    x
> > 2050-2060    x    x    x    x    x    x    6    7    0    1    2    3
> >
> > So as an example, in our current decade, we would use prefix 2 to indicated
> dates between 1990 and 2000, prefix 4 to indicate dates in our decade, and
> prefix 7 to indicate dates between 2040 and 2050. Prefixes 0 and 1 are on
> purpose currently out of service to avoid any misunderstadings (does prefix 0
> refer to 1970-80 or to 2050-60?). This way we avoid problems at the start/end
> of a decade, when some servers might think they are still in the old decade,
> where some others already think they are in the new decade.
> >
> > This is just a very rough sketch; the decades should be non-overlapping
> (1991-2000), it shouldn't be exactly decades, but some other intervals that
> we can cover with an exact number of bits. And maybe the past/future
> balance isn't ideal (currently 2 past and 3 future decades, maybe just 1 future
> and 4 past is better, or so).
> >
> > Anyway, I hope you can see the basic principles of the system: Use a
> rotating scheme with a very rough current anchoring and a wide-enough
> period of slack to avoid ambiguities.
> >
> > Regards,    Martin.
> >
> >
> >> On Wed, Jan 16, 2013 at 3:48 PM, Roberto Peon<grmocg@gmail.com>
> wrote:
> >>
> >>> How about setting epoch as the first request in the connection? :)
> >>> -=R
> >>>
> >>>
> >>> On Wed, Jan 16, 2013 at 3:45 PM, Nico
> Williams<nico@cryptonector.com>wrote:
> >>>
> >>>> On Wed, Jan 16, 2013 at 5:39 PM, Mark Nottingham<mnot@mnot.net>
> wrote:
> >>>>> On 17/01/2013, at 10:35 AM, Nico Williams<nico@cryptonector.com>
> >>>> wrote:
> >>>>> Yep, but you either need to make the epoch start at least a few
> >>>>> years
> >>>> ago (old Last-Modified times, is important for heuristic
> >>>> freshness), OR keep it signed (losing a bit).
> >>>>>
> >>>>> And I think you need more than 12 bits for seconds in a day...
> >>>>
> >>>> Oops, for some reason I thought of seconds in an hour.  So 5 more
> >>>> bits, and we're about even with seconds since epoch.  Either way
> >>>> getting from 24 bytes to 4 is pretty good, and no compression
> >>>> scheme will do better.
> >>>>
> >>>>
> >>>
> >>
> 
> --
> Mark Nottingham   http://www.mnot.net/
> 
> 
> 

Received on Friday, 18 January 2013 13:58:53 UTC