RE: HPACK benchmark test for substitution indexing vs incremental indexing only from RUELLAN Herve on 2013-09-27 (ietf-http-wg@w3.org from July to September 2013)

From: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
Date: Fri, 27 Sep 2013 16:34:56 +0000
To: Mike Bishop <Michael.Bishop@microsoft.com>, Gábor Molnár <gabor.molnar@sch.bme.hu>, Roberto Peon <grmocg@gmail.com>
CC: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <6C71876BDCCD01488E70A2399529D5E52F4F3782@ADELE.crf.canon.fr>

I've ran some tests on the "mnot" test set and I've been able to see some effect from using substitution:
- Substituting compressor:   29.5%
- Append-only compressor: 30.3%

It's true that the effect of substitution is limited on average. However, it allows building an intelligent encoder that can take advantage of knowledge about the headers.

Hervé.

> -----Original Message-----
> From: Mike Bishop [mailto:Michael.Bishop@microsoft.com]
> Sent: mardi 24 septembre 2013 08:40
> To: Gábor Molnár; Roberto Peon
> Cc: Tatsuhiro Tsujikawa; ietf-http-wg@w3.org
> Subject: Re: HPACK benchmark test for substitution indexing vs incremental
> indexing only
> 
> I actually had written a compressor with foreknowledge to run against the
> data set that was sent to the list before the last meeting, specifically for the
> purpose of giving an “ideal” against which to measure real-world algorithms.
> 
> Today I did some refactoring and added a switch that blocks it from ever
> doing a substitution, always sending it down the path where it didn’t find a
> suitable entry to replace.  The results suggest that the gain from being able to
> substitute is infinitesimal.  Given perfect knowledge of future headers,
> here’s what I see from the first SQLite file:
> 
> *
>  Substituting compressor:  36.036% of plaintext size
> *
>  Append-only compressor:  36.045% of plaintext size
> 
> 
> When we’re talking about ~0.01% gain in efficiency, I think there’s a strong
> argument to be made in favor of simplicity.  (Incidentally, a quick
> performance profile of my code shows it spends the majority of its time
> deciding what entry to replace!)
> 
> Sent from Windows Mail
> 
> From: Gábor Molnár <mailto:gabor.molnar@sch.bme.hu>
> Sent: ‎Saturday‎, ‎September‎ ‎21‎, ‎2013 ‎4‎:‎23‎ ‎PM
> To: Roberto Peon <mailto:grmocg@gmail.com>
> Cc: Tatsuhiro Tsujikawa <mailto:tatsuhiro.t@gmail.com> , ietf-http-
> wg@w3.org
> 
> It would be interesting to test if a substitution strategy is better than
> incremental if it knows all the upcoming headers *in advance*. This would
> simulate the performance of the best possible heuristic algorithm.
> 
> 
> 
> 2013/9/21 Roberto Peon <grmocg@gmail.com>
> 
> 
>  When I was doing a similar comparative analysis, I found that
> incremental indexing did better than substitution indexing as well.
>  I suspect that substitution indexing benefits strongly from heuristics,
> e.g. compute the probability that a header is reused and use that to
> determine if you replace or not.
>  That being said, I'm still unsure if the complexity of substitution
> indexing is worth the potential benefit.
> 
> 
>  -=R
> 
> 
>  On Sat, Sep 21, 2013 at 5:12 AM, Tatsuhiro Tsujikawa
> <tatsuhiro.t@gmail.com> wrote:
> 
> 
>   I made a simple benchmark test for substitution indexing vs
>   incremental indexing only and share its results here.
> 
>   The detailed results can be found at
>   https://github.com/tatsuhiro-t/nghttp2/wiki/hpackSubst

> 
>   """
>   HPACK draft offers 2 kind of indexing methods: incremental
> and
>   substitution. In nghttp2, we always use incremental
>   indexing. This is because we do not have good strategy to
> use
>   substitution indexing efficiently. We suspect that it is in the
>   draft because it has some use cases, but we don't see them
> yet.
> 
>   So we did some tests comparing our incremental only
> strategy and
>   the experimental strategy utilizing substitution indexing.
> 
>   Our incremental only strategy goes as follows:
> 
>   1. If the name/value pair is in the header table, use indexed
>      representation.
> 
>   2. Else, if name is in the header table, use incremental
> indexing
>      with indexed name.
> 
>   3. Else, use incremental indexing with new name.
> 
>   The experimental strategy utilizing substitution indexing
> changes
>   step 2 as follows:
> 
>   2. Else, if name is in the header table, substitute that entry;
>      use substitution indexing with indexed name.
> 
>   We use data set in https://github.com/http2/http_samples

> <https://github.com/http2/http_samples> .
> 
>   The detailed results are listed in the following sections.
> 
>   The end result is that, in the overall, incremental only
> strategy
>   is more efficient than the strategy with substitution. But the
>   difference is not so large. On some data set, the substitution
>   performed well, so depending on the data set, the winner
> may
>   change. Also there may be better strategy for substitution.
>   """
> 
>   It turns out that the experimental strategy is used in node-
> http2
>   and firefox.
> 
> 
>

Received on Friday, 27 September 2013 16:35:36 UTC