Re: HPACK benchmark test for substitution indexing vs incremental indexing only from Roberto Peon on 2013-09-24 (ietf-http-wg@w3.org from July to September 2013)

From: Roberto Peon <grmocg@gmail.com>
Date: Mon, 23 Sep 2013 23:56:14 -0700
To: Mike Bishop <Michael.Bishop@microsoft.com>
Cc: Gábor Molnár <gabor.molnar@sch.bme.hu>, Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CAP+FsNe_TOhPLgVHdjVCDVnZd4VComaRB9By7s7LF0vD_BANHQ@mail.gmail.com>

The other part to the argument for using only appends is that it allows for
some out-of-order processing for transports like TCP-Minion, where you can
receive segments out of order (e.g. when packet loss occurs for one
request, but the following one is unaffected).

I'm sure that none of us want to swallow that complexity today, mind you,
but it could be important in the nearish future.

-=R


On Mon, Sep 23, 2013 at 11:40 PM, Mike Bishop
<Michael.Bishop@microsoft.com>wrote:

>  I actually had written a compressor with foreknowledge to run against
> the data set that was sent to the list before the last meeting,
> specifically for the purpose of giving an “ideal” against which to measure
> real-world algorithms.
>
>  Today I did some refactoring and added a switch that blocks it from ever
> doing a substitution, always sending it down the path where it didn’t find
> a suitable entry to replace.  The results suggest that the gain from being
> able to substitute is infinitesimal.  Given perfect knowledge of future
> headers, here’s what I see from the first SQLite file:
>
>    - Substituting compressor:  36.036% of plaintext size
>    - Append-only compressor:  36.045% of plaintext size
>
>
>  When we’re talking about ~0.01% gain in efficiency, I think there’s a
> strong argument to be made in favor of simplicity.  (Incidentally, a quick
> performance profile of my code shows it spends the majority of its time
> deciding what entry to replace!)
>
>  Sent from Windows Mail
>
>  *From:* Gábor Molnár <gabor.molnar@sch.bme.hu>
> *Sent:* Saturday, September 21, 2013 4:23 PM
> *To:* Roberto Peon <grmocg@gmail.com>
> *Cc:* Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>, ietf-http-wg@w3.org
>
>  It would be interesting to test if a substitution strategy is better
> than incremental if it knows all the upcoming headers *in advance*. This
> would simulate the performance of the best possible heuristic algorithm.
>
>
> 2013/9/21 Roberto Peon <grmocg@gmail.com>
>
>> When I was doing a similar comparative analysis, I found that incremental
>> indexing did better than substitution indexing as well.
>> I suspect that substitution indexing benefits strongly from heuristics,
>> e.g. compute the probability that a header is reused and use that to
>> determine if you replace or not.
>> That being said, I'm still unsure if the complexity of substitution
>> indexing is worth the potential benefit.
>>
>>  -=R
>>
>>
>> On Sat, Sep 21, 2013 at 5:12 AM, Tatsuhiro Tsujikawa <
>> tatsuhiro.t@gmail.com> wrote:
>>
>>>  I made a simple benchmark test for substitution indexing vs
>>> incremental indexing only and share its results here.
>>>
>>>  The detailed results can be found at
>>> https://github.com/tatsuhiro-t/nghttp2/wiki/hpackSubst
>>>
>>>  """
>>> HPACK draft offers 2 kind of indexing methods: incremental and
>>> substitution. In nghttp2, we always use incremental
>>> indexing. This is because we do not have good strategy to use
>>> substitution indexing efficiently. We suspect that it is in the
>>> draft because it has some use cases, but we don't see them yet.
>>>
>>>  So we did some tests comparing our incremental only strategy and
>>> the experimental strategy utilizing substitution indexing.
>>>
>>>  Our incremental only strategy goes as follows:
>>>
>>>  1. If the name/value pair is in the header table, use indexed
>>>    representation.
>>>
>>>  2. Else, if name is in the header table, use incremental indexing
>>>    with indexed name.
>>>
>>>  3. Else, use incremental indexing with new name.
>>>
>>>  The experimental strategy utilizing substitution indexing changes
>>> step 2 as follows:
>>>
>>>  2. Else, if name is in the header table, substitute that entry;
>>>    use substitution indexing with indexed name.
>>>
>>>  We use data set in https://github.com/http2/http_samples.
>>>
>>>  The detailed results are listed in the following sections.
>>>
>>>  The end result is that, in the overall, incremental only strategy
>>> is more efficient than the strategy with substitution. But the
>>> difference is not so large. On some data set, the substitution
>>> performed well, so depending on the data set, the winner may
>>> change. Also there may be better strategy for substitution.
>>> """
>>>
>>>  It turns out that the experimental strategy is used in node-http2
>>> and firefox.
>>>
>>>
>>
>

Received on Tuesday, 24 September 2013 06:56:41 UTC