Re: HPACK benchmark test for substitution indexing vs incremental indexing only from Roberto Peon on 2013-09-24 (ietf-http-wg@w3.org from July to September 2013)

From: Roberto Peon <grmocg@gmail.com>
Date: Tue, 24 Sep 2013 13:29:06 -0700
To: Mike Bishop <Michael.Bishop@microsoft.com>
Cc: James M Snell <jasnell@gmail.com>, Gábor Molnár <gabor.molnar@sch.bme.hu>, Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CAP+FsNde756=XwVBNEHCLZi1XhQ0qF6DbLiNHFZMENLzO-oiLw@mail.gmail.com>
Yup, roughly ~15% of the compressor's efficiency is due to this, per my old
calculations. Yours suggests ~14%, which seems in-line.
This is regardless of using various other encodings (e.g. huffman), which
also affect output size-- those should affect the output roughly linearly
with the amount of text given.

-=R


On Tue, Sep 24, 2013 at 1:17 PM, Mike Bishop
<Michael.Bishop@microsoft.com>wrote:

>  36% vs. 42% by eliminating backreferences, on my testbed.  Not huge, but
> certainly noticeable.
>
>  Sent from Windows Mail
>
>   *From:* Roberto Peon <grmocg@gmail.com>
> *Sent:* Tuesday, September 24, 2013 2:53 PM
> *To:* Mike Bishop <Michael.Bishop@microsoft.com>
> *Cc:* James M Snell <jasnell@gmail.com>, Gábor Molnár<gabor.molnar@sch.bme.hu>,
> Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>, ietf-http-wg@w3.org
>
>  Not to mention that the name encoding has a fair size impact on
> compression efficiency.
>
>  -=R
>
>
> On Tue, Sep 24, 2013 at 12:52 PM, Roberto Peon <grmocg@gmail.com> wrote:
>
>> No, substitution is different because it prevents some future usecases
>> (the head-of-line unblocking one).
>> Substitution is also different because it requires a non-linear memory
>> model and generally results in heap fragmentation, whereas strict
>> incremental indexing allows for a completely linear memory model with no
>> fragmenting.
>> -=R
>>
>>
>>  On Tue, Sep 24, 2013 at 12:49 PM, Mike Bishop <
>> Michael.Bishop@microsoft.com> wrote:
>>
>>>   Here I’ll disagree, though I haven’t coded a test for how much it’s
>>> worth.  On the decompressor side, the expense is minimal (index lookup),
>>> and on the compressor side it’s strictly optional.  (Though I suppose the
>>> same argument exists for substitution, really….)
>>>
>>>  Sent from Windows Mail
>>>
>>>   *From:* James M Snell <jasnell@gmail.com>
>>> *Sent:* Tuesday, September 24, 2013 10:52 AM
>>>
>>> *To:* Mike Bishop <Michael.Bishop@microsoft.com>
>>> *Cc:* Gábor Molnár <gabor.molnar@sch.bme.hu>, Roberto Peon<grmocg@gmail.com>,
>>> Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>, ietf-http-wg@w3.org
>>>
>>>   On Mon, Sep 23, 2013 at 11:40 PM, Mike Bishop
>>> <Michael.Bishop@microsoft.com> wrote:
>>> >[snip]
>>> >
>>> > When we’re talking about ~0.01% gain in efficiency, I think there’s a
>>> strong
>>> > argument to be made in favor of simplicity.  (Incidentally, a quick
>>> > performance profile of my code shows it spends the majority of its time
>>> > deciding what entry to replace!)
>>> >
>>>
>>> Similar results on this end. Substitution is simply not worth the
>>> extra complexity. +1 to dropping it.
>>>
>>> I'm also not a fan of the name index reference option given that it
>>> requires us to search the table for a suitable name index. I'd rather
>>> take the somewhat less efficient on-the-wire encoding than scan the
>>> table for name indexes.
>>>
>>> - James
>>>
>>> > Sent from Windows Mail
>>> >
>>> > From: Gábor Molnár
>>>  > Sent: Saturday, September 21, 2013 4:23 PM
>>>
>>> > To: Roberto Peon
>>> > Cc: Tatsuhiro Tsujikawa, ietf-http-wg@w3.org
>>> >
>>> > It would be interesting to test if a substitution strategy is better
>>> than
>>> > incremental if it knows all the upcoming headers *in advance*. This
>>> would
>>> > simulate the performance of the best possible heuristic algorithm.
>>> >
>>> >
>>> > 2013/9/21 Roberto Peon <grmocg@gmail.com>
>>> >>
>>> >> When I was doing a similar comparative analysis, I found that
>>> incremental
>>> >> indexing did better than substitution indexing as well.
>>> >> I suspect that substitution indexing benefits strongly from
>>> heuristics,
>>> >> e.g. compute the probability that a header is reused and use that to
>>> >> determine if you replace or not.
>>> >> That being said, I'm still unsure if the complexity of substitution
>>> >> indexing is worth the potential benefit.
>>> >>
>>> >> -=R
>>> >>
>>> >>
>>> >> On Sat, Sep 21, 2013 at 5:12 AM, Tatsuhiro Tsujikawa
>>> >> <tatsuhiro.t@gmail.com> wrote:
>>> >>>
>>> >>> I made a simple benchmark test for substitution indexing vs
>>> >>> incremental indexing only and share its results here.
>>> >>>
>>> >>> The detailed results can be found at
>>> >>> https://github.com/tatsuhiro-t/nghttp2/wiki/hpackSubst
>>> >>>
>>> >>> """
>>> >>> HPACK draft offers 2 kind of indexing methods: incremental and
>>> >>> substitution. In nghttp2, we always use incremental
>>> >>> indexing. This is because we do not have good strategy to use
>>> >>> substitution indexing efficiently. We suspect that it is in the
>>> >>> draft because it has some use cases, but we don't see them yet.
>>> >>>
>>> >>> So we did some tests comparing our incremental only strategy and
>>> >>> the experimental strategy utilizing substitution indexing.
>>> >>>
>>> >>> Our incremental only strategy goes as follows:
>>> >>>
>>> >>> 1. If the name/value pair is in the header table, use indexed
>>> >>>    representation.
>>> >>>
>>> >>> 2. Else, if name is in the header table, use incremental indexing
>>> >>>    with indexed name.
>>> >>>
>>> >>> 3. Else, use incremental indexing with new name.
>>> >>>
>>> >>> The experimental strategy utilizing substitution indexing changes
>>> >>> step 2 as follows:
>>> >>>
>>> >>> 2. Else, if name is in the header table, substitute that entry;
>>> >>>    use substitution indexing with indexed name.
>>> >>>
>>> >>> We use data set in https://github.com/http2/http_samples.
>>> >>>
>>> >>> The detailed results are listed in the following sections.
>>> >>>
>>> >>> The end result is that, in the overall, incremental only strategy
>>> >>> is more efficient than the strategy with substitution. But the
>>> >>> difference is not so large. On some data set, the substitution
>>> >>> performed well, so depending on the data set, the winner may
>>> >>> change. Also there may be better strategy for substitution.
>>> >>> """
>>> >>>
>>> >>> It turns out that the experimental strategy is used in node-http2
>>> >>> and firefox.
>>> >>>
>>> >>
>>> >
>>>
>>
>>
>
Received on Tuesday, 24 September 2013 20:29:35 UTC