Re: HPACK benchmark test for substitution indexing vs incremental indexing only from Tatsuhiro Tsujikawa on 2013-09-28 (ietf-http-wg@w3.org from July to September 2013)

From: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>
Date: Sat, 28 Sep 2013 23:35:45 +0900
To: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
Cc: Mike Bishop <Michael.Bishop@microsoft.com>, Gábor Molnár <gabor.molnar@sch.bme.hu>, Roberto Peon <grmocg@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CAPyZ6=L3WZaiBVgFvSc3X0+kZdhG4ZBRJfHxUNvWUxzwQ1fo9Q@mail.gmail.com>
On Sat, Sep 28, 2013 at 1:34 AM, RUELLAN Herve
<Herve.Ruellan@crf.canon.fr>wrote:

> I've ran some tests on the "mnot" test set and I've been able to see some
> effect from using substitution:
> - Substituting compressor:   29.5%
> - Append-only compressor: 30.3%
>
>
Since these numbers are much better than the my results (which is 39% and
37% respectively),
I'm very interested in the algorithm you used to achieve them.
Could you share it with us?
Is it some kind of algorithm to take advantage of the complete knowledge of
incoming headers?
Or based on statistics on today's web traffic?

Note that my results combine request and response headers.

Because of relative difference is small, I'm lean to the side to remove
substitution for now.
Mostly it is because the decoder have to pay for it. The encoder can just
skip substitution
if it wants, but decoder has no choice.

Best regards,

Tatsuhiro Tsujikawa



> It's true that the effect of substitution is limited on average. However,
> it allows building an intelligent encoder that can take advantage of
> knowledge about the headers.
>
> Hervé.
>
> > -----Original Message-----
> > From: Mike Bishop [mailto:Michael.Bishop@microsoft.com]
> > Sent: mardi 24 septembre 2013 08:40
> > To: Gábor Molnár; Roberto Peon
> > Cc: Tatsuhiro Tsujikawa; ietf-http-wg@w3.org
> > Subject: Re: HPACK benchmark test for substitution indexing vs
> incremental
> > indexing only
> >
> > I actually had written a compressor with foreknowledge to run against the
> > data set that was sent to the list before the last meeting, specifically
> for the
> > purpose of giving an “ideal” against which to measure real-world
> algorithms.
> >
> > Today I did some refactoring and added a switch that blocks it from ever
> > doing a substitution, always sending it down the path where it didn’t
> find a
> > suitable entry to replace.  The results suggest that the gain from being
> able to
> > substitute is infinitesimal.  Given perfect knowledge of future headers,
> > here’s what I see from the first SQLite file:
> >
> > *
> >       Substituting compressor:  36.036% of plaintext size
> > *
> >       Append-only compressor:  36.045% of plaintext size
> >
> >
> > When we’re talking about ~0.01% gain in efficiency, I think there’s a
> strong
> > argument to be made in favor of simplicity.  (Incidentally, a quick
> > performance profile of my code shows it spends the majority of its time
> > deciding what entry to replace!)
> >
> > Sent from Windows Mail
> >
> > From: Gábor Molnár <mailto:gabor.molnar@sch.bme.hu>
> > Sent: Saturday, September 21, 2013 4:23 PM
> > To: Roberto Peon <mailto:grmocg@gmail.com>
> > Cc: Tatsuhiro Tsujikawa <mailto:tatsuhiro.t@gmail.com> , ietf-http-
> > wg@w3.org
> >
> > It would be interesting to test if a substitution strategy is better than
> > incremental if it knows all the upcoming headers *in advance*. This would
> > simulate the performance of the best possible heuristic algorithm.
> >
> >
> >
> > 2013/9/21 Roberto Peon <grmocg@gmail.com>
> >
> >
> >       When I was doing a similar comparative analysis, I found that
> > incremental indexing did better than substitution indexing as well.
> >       I suspect that substitution indexing benefits strongly from
> heuristics,
> > e.g. compute the probability that a header is reused and use that to
> > determine if you replace or not.
> >       That being said, I'm still unsure if the complexity of substitution
> > indexing is worth the potential benefit.
> >
> >
> >       -=R
> >
> >
> >       On Sat, Sep 21, 2013 at 5:12 AM, Tatsuhiro Tsujikawa
> > <tatsuhiro.t@gmail.com> wrote:
> >
> >
> >               I made a simple benchmark test for substitution indexing vs
> >               incremental indexing only and share its results here.
> >
> >               The detailed results can be found at
> >               https://github.com/tatsuhiro-t/nghttp2/wiki/hpackSubst
> >
> >               """
> >               HPACK draft offers 2 kind of indexing methods: incremental
> > and
> >               substitution. In nghttp2, we always use incremental
> >               indexing. This is because we do not have good strategy to
> > use
> >               substitution indexing efficiently. We suspect that it is
> in the
> >               draft because it has some use cases, but we don't see them
> > yet.
> >
> >               So we did some tests comparing our incremental only
> > strategy and
> >               the experimental strategy utilizing substitution indexing.
> >
> >               Our incremental only strategy goes as follows:
> >
> >               1. If the name/value pair is in the header table, use
> indexed
> >                  representation.
> >
> >               2. Else, if name is in the header table, use incremental
> > indexing
> >                  with indexed name.
> >
> >               3. Else, use incremental indexing with new name.
> >
> >               The experimental strategy utilizing substitution indexing
> > changes
> >               step 2 as follows:
> >
> >               2. Else, if name is in the header table, substitute that
> entry;
> >                  use substitution indexing with indexed name.
> >
> >               We use data set in https://github.com/http2/http_samples
> > <https://github.com/http2/http_samples> .
> >
> >               The detailed results are listed in the following sections.
> >
> >               The end result is that, in the overall, incremental only
> > strategy
> >               is more efficient than the strategy with substitution. But
> the
> >               difference is not so large. On some data set, the
> substitution
> >               performed well, so depending on the data set, the winner
> > may
> >               change. Also there may be better strategy for substitution.
> >               """
> >
> >               It turns out that the experimental strategy is used in
> node-
> > http2
> >               and firefox.
> >
> >
> >
>
>
Received on Saturday, 28 September 2013 14:36:32 UTC