Re: incrementally indexed headers should be inserted in index '0' instead of len(table)+1

I did this testing a while back (look for email entitled "Compression
analysis of perfect atom-based compressor").

Someone can hack something up to do this again on the new corpus, but the
data was pretty unequivocal.
I've attached a PNG version of the freq vs dist-from-newest-element graph.

Again, this was with a "perfect" FIFO-based compressor, which had no expiry
and infinite space over the test corpus (with each domain having a separate
compressor, but summing into the final total) that was available to us
before the new larger, if munged, corpus that was recently contributed.


This is a graph of the number of times that you'd like to have
backreferenced something (y-axis) at the distance from the newest element
(x-axis).
If we always found that we wished to backreference the 10th most recent
item (which is most certainly not the case, but IF), then this graph would
be a single spike at 10.

And here is the data.
[image: Inline image 1]
I'd welcome someone computing this again!

-=R


On Thu, Aug 15, 2013 at 1:30 PM, Martin Thomson <martin.thomson@gmail.com>wrote:

> On 15 August 2013 13:00, Roberto Peon <grmocg@gmail.com> wrote:
> > Currently, index 0 is the 'top' of the table, and when a new header is
> added
> > using incremental indexing, it is added at the 'bottom' (i.e. the highest
> > number index).
>
> There are two separate issues here.  Strangely, the right decision
> depends on the same question.  See below.
>
> The first is whether the most recently added values should appear at
> index 0 or index n when adding to the table.  This wouldn't matter at
> all if it weren't for the fact that small values take less space to
> encode.  So, any decision we make should be based on what we believe
> is going to be reused more often: things that are added recently, or
> things that have been present for a long time.  If it's recent items
> that are being used more often, we probably need to add at index 0.
>
> The second relates to eviction: which items be evicted from the header
> table?  Those that were most recently added, or those that have been
> around for a long time?  As I hinted before, this depends on the same
> question.
>
> Intuitively at least, I believe that there will be a split on the
> header reuse question.  Some things, like :method=GET, are probably
> going to be quite stable.  Other things, like cookies, tend to change
> such that old values are invalid (Cookie in particular might be a good
> candidate for substitution).  If there is a split, then I can imagine
> that some optimizations could give good returns... maybe something
> analogous to a generational garbage collector.  But then, I can't
> imagine a way to avoid the resulting performance and complexity
> issues.
>
> But that's just analysis.  Testing might be an easier way to decide,
> or at least to inform the choice.
>

Received on Thursday, 15 August 2013 21:07:15 UTC