Re: HPACK benchmark test for substitution indexing vs incremental indexing only

I actually had written a compressor with foreknowledge to run against the data set that was sent to the list before the last meeting, specifically for the purpose of giving an "ideal" against which to measure real-world algorithms.

Today I did some refactoring and added a switch that blocks it from ever doing a substitution, always sending it down the path where it didn't find a suitable entry to replace.  The results suggest that the gain from being able to substitute is infinitesimal.  Given perfect knowledge of future headers, here's what I see from the first SQLite file:

  *
Substituting compressor:  36.036% of plaintext size
  *
Append-only compressor:  36.045% of plaintext size

When we're talking about ~0.01% gain in efficiency, I think there's a strong argument to be made in favor of simplicity.  (Incidentally, a quick performance profile of my code shows it spends the majority of its time deciding what entry to replace!)

Sent from Windows Mail

From: Gábor Molnár<mailto:gabor.molnar@sch.bme.hu>
Sent: ?Saturday?, ?September? ?21?, ?2013 ?4?:?23? ?PM
To: Roberto Peon<mailto:grmocg@gmail.com>
Cc: Tatsuhiro Tsujikawa<mailto:tatsuhiro.t@gmail.com>, ietf-http-wg@w3.org<mailto:ietf-http-wg@w3.org>

It would be interesting to test if a substitution strategy is better than incremental if it knows all the upcoming headers *in advance*. This would simulate the performance of the best possible heuristic algorithm.


2013/9/21 Roberto Peon <grmocg@gmail.com<mailto:grmocg@gmail.com>>
When I was doing a similar comparative analysis, I found that incremental indexing did better than substitution indexing as well.
I suspect that substitution indexing benefits strongly from heuristics, e.g. compute the probability that a header is reused and use that to determine if you replace or not.
That being said, I'm still unsure if the complexity of substitution indexing is worth the potential benefit.

-=R


On Sat, Sep 21, 2013 at 5:12 AM, Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com<mailto:tatsuhiro.t@gmail.com>> wrote:
I made a simple benchmark test for substitution indexing vs
incremental indexing only and share its results here.

The detailed results can be found at
https://github.com/tatsuhiro-t/nghttp2/wiki/hpackSubst

"""
HPACK draft offers 2 kind of indexing methods: incremental and
substitution. In nghttp2, we always use incremental
indexing. This is because we do not have good strategy to use
substitution indexing efficiently. We suspect that it is in the
draft because it has some use cases, but we don't see them yet.

So we did some tests comparing our incremental only strategy and
the experimental strategy utilizing substitution indexing.

Our incremental only strategy goes as follows:

1. If the name/value pair is in the header table, use indexed
   representation.

2. Else, if name is in the header table, use incremental indexing
   with indexed name.

3. Else, use incremental indexing with new name.

The experimental strategy utilizing substitution indexing changes
step 2 as follows:

2. Else, if name is in the header table, substitute that entry;
   use substitution indexing with indexed name.

We use data set in https://github.com/http2/http_samples.

The detailed results are listed in the following sections.

The end result is that, in the overall, incremental only strategy
is more efficient than the strategy with substitution. But the
difference is not so large. On some data set, the substitution
performed well, so depending on the data set, the winner may
change. Also there may be better strategy for substitution.
"""

It turns out that the experimental strategy is used in node-http2
and firefox.

Received on Tuesday, 24 September 2013 06:40:46 UTC