Re: HPACK benchmark test for substitution indexing vs incremental indexing only from James M Snell on 2013-09-24 (ietf-http-wg@w3.org from July to September 2013)

From: James M Snell <jasnell@gmail.com>
Date: Tue, 24 Sep 2013 13:26:24 -0700
To: Mike Bishop <Michael.Bishop@microsoft.com>
Cc: Roberto Peon <grmocg@gmail.com>, Gábor Molnár <gabor.molnar@sch.bme.hu>, Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CABP7Rbe6=7YJGjzhGUK7ZTzMGBh85TuG2oX_xpFOeysWZqpHWw@mail.gmail.com>
A difference we likely could easily make up for by applying typed
codecs on values that frequently change (e.g. dates, content-length,
etc).

On Tue, Sep 24, 2013 at 1:17 PM, Mike Bishop
<Michael.Bishop@microsoft.com> wrote:
> 36% vs. 42% by eliminating backreferences, on my testbed.  Not huge, but
> certainly noticeable.
>
> Sent from Windows Mail
>
> From: Roberto Peon
> Sent: ‎Tuesday‎, ‎September‎ ‎24‎, ‎2013 ‎2‎:‎53‎ ‎PM
> To: Mike Bishop
> Cc: James M Snell, Gábor Molnár, Tatsuhiro Tsujikawa, ietf-http-wg@w3.org
>
> Not to mention that the name encoding has a fair size impact on compression
> efficiency.
>
> -=R
>
>
> On Tue, Sep 24, 2013 at 12:52 PM, Roberto Peon <grmocg@gmail.com> wrote:
>>
>> No, substitution is different because it prevents some future usecases
>> (the head-of-line unblocking one).
>> Substitution is also different because it requires a non-linear memory
>> model and generally results in heap fragmentation, whereas strict
>> incremental indexing allows for a completely linear memory model with no
>> fragmenting.
>> -=R
>>
>>
>> On Tue, Sep 24, 2013 at 12:49 PM, Mike Bishop
>> <Michael.Bishop@microsoft.com> wrote:
>>>
>>> Here I’ll disagree, though I haven’t coded a test for how much it’s
>>> worth.  On the decompressor side, the expense is minimal (index lookup), and
>>> on the compressor side it’s strictly optional.  (Though I suppose the same
>>> argument exists for substitution, really….)
>>>
>>> Sent from Windows Mail
>>>
>>> From: James M Snell
>>> Sent: Tuesday, September 24, 2013 10:52 AM
>>>
>>> To: Mike Bishop
>>> Cc: Gábor Molnár, Roberto Peon, Tatsuhiro Tsujikawa, ietf-http-wg@w3.org
>>>
>>> On Mon, Sep 23, 2013 at 11:40 PM, Mike Bishop
>>> <Michael.Bishop@microsoft.com> wrote:
>>> >[snip]
>>> >
>>> > When we’re talking about ~0.01% gain in efficiency, I think there’s a
>>> > strong
>>> > argument to be made in favor of simplicity.  (Incidentally, a quick
>>> > performance profile of my code shows it spends the majority of its time
>>> > deciding what entry to replace!)
>>> >
>>>
>>> Similar results on this end. Substitution is simply not worth the
>>> extra complexity. +1 to dropping it.
>>>
>>> I'm also not a fan of the name index reference option given that it
>>> requires us to search the table for a suitable name index. I'd rather
>>> take the somewhat less efficient on-the-wire encoding than scan the
>>> table for name indexes.
>>>
>>> - James
>>>
>>> > Sent from Windows Mail
>>> >
>>> > From: Gábor Molnár
>>> > Sent: Saturday, September 21, 2013 4:23 PM
>>>
>>> > To: Roberto Peon
>>> > Cc: Tatsuhiro Tsujikawa, ietf-http-wg@w3.org
>>> >
>>> > It would be interesting to test if a substitution strategy is better
>>> > than
>>> > incremental if it knows all the upcoming headers *in advance*. This
>>> > would
>>> > simulate the performance of the best possible heuristic algorithm.
>>> >
>>> >
>>> > 2013/9/21 Roberto Peon <grmocg@gmail.com>
>>> >>
>>> >> When I was doing a similar comparative analysis, I found that
>>> >> incremental
>>> >> indexing did better than substitution indexing as well.
>>> >> I suspect that substitution indexing benefits strongly from
>>> >> heuristics,
>>> >> e.g. compute the probability that a header is reused and use that to
>>> >> determine if you replace or not.
>>> >> That being said, I'm still unsure if the complexity of substitution
>>> >> indexing is worth the potential benefit.
>>> >>
>>> >> -=R
>>> >>
>>> >>
>>> >> On Sat, Sep 21, 2013 at 5:12 AM, Tatsuhiro Tsujikawa
>>> >> <tatsuhiro.t@gmail.com> wrote:
>>> >>>
>>> >>> I made a simple benchmark test for substitution indexing vs
>>> >>> incremental indexing only and share its results here.
>>> >>>
>>> >>> The detailed results can be found at
>>> >>> https://github.com/tatsuhiro-t/nghttp2/wiki/hpackSubst
>>> >>>
>>> >>> """
>>> >>> HPACK draft offers 2 kind of indexing methods: incremental and
>>> >>> substitution. In nghttp2, we always use incremental
>>> >>> indexing. This is because we do not have good strategy to use
>>> >>> substitution indexing efficiently. We suspect that it is in the
>>> >>> draft because it has some use cases, but we don't see them yet.
>>> >>>
>>> >>> So we did some tests comparing our incremental only strategy and
>>> >>> the experimental strategy utilizing substitution indexing.
>>> >>>
>>> >>> Our incremental only strategy goes as follows:
>>> >>>
>>> >>> 1. If the name/value pair is in the header table, use indexed
>>> >>>    representation.
>>> >>>
>>> >>> 2. Else, if name is in the header table, use incremental indexing
>>> >>>    with indexed name.
>>> >>>
>>> >>> 3. Else, use incremental indexing with new name.
>>> >>>
>>> >>> The experimental strategy utilizing substitution indexing changes
>>> >>> step 2 as follows:
>>> >>>
>>> >>> 2. Else, if name is in the header table, substitute that entry;
>>> >>>    use substitution indexing with indexed name.
>>> >>>
>>> >>> We use data set in https://github.com/http2/http_samples.
>>> >>>
>>> >>> The detailed results are listed in the following sections.
>>> >>>
>>> >>> The end result is that, in the overall, incremental only strategy
>>> >>> is more efficient than the strategy with substitution. But the
>>> >>> difference is not so large. On some data set, the substitution
>>> >>> performed well, so depending on the data set, the winner may
>>> >>> change. Also there may be better strategy for substitution.
>>> >>> """
>>> >>>
>>> >>> It turns out that the experimental strategy is used in node-http2
>>> >>> and firefox.
>>> >>>
>>> >>
>>> >
>>
>>
>
Received on Tuesday, 24 September 2013 20:27:13 UTC