Re: Header Compression Clarifications from Gábor Molnár on 2013-07-05 (ietf-http-wg@w3.org from July to September 2013)

From: Gábor Molnár <gabor.molnar@sch.bme.hu>
Date: Fri, 05 Jul 2013 04:44:19 +0200
To: Roberto Peon <grmocg@gmail.com>
Cc: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>, James M Snell <jasnell@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-id: <CA+KJw_4EQctQWRFQ=hQ8FVO8rBTGv_YoNGULBxxccQapChvcgg@mail.gmail.com>
2013/7/4 Gábor Molnár <gabor.molnar@sch.bme.hu>

> Let's consider the memory management of a real implementation. When
> there's an incoming "Literal Header Representation with Substitution
> Indexing" , the implementation first needs to assemble the header name and
> value *somewhere* in memory. After that, it probably passes around pointers
> to these buffers and the header table is just an array of pointers plus
> administration data.
>
> I think there's two interesting consequences of this:
>  * since the new header is already in the memory somewhere, it doesn't
> really matter if we do add-then-enforce or enforce-then-add, both the old
> header table and the new name-value pair has to fit into the memory
>  * the size of name-value pairs should be bounded as well
>

Just realized that bounding the size of name-value pairs is probably not an
option, since HTTP does not limit
header name/value length.


>
> I'm not very experienced in using programming languages with manual memory
> management, so please correct me if that's not a reasonable model for a C
> implementation for example.
>
>
> 2013/7/4 Roberto Peon <grmocg@gmail.com>
>
>> The approach of add-then-enforce guarantee has a glaring and huge
>> problem--  it cannot guarantee the amount of memory I'll use.
>> I'd much rather deal with some small complexity (demonstrably not big)
>> here and have that guarantee.
>>  -=R
>>
>>
>> On Thu, Jul 4, 2013 at 6:09 AM, RUELLAN Herve <Herve.Ruellan@crf.canon.fr
>> > wrote:
>>
>>> Trying to catch on the discussion, I see three proposals for solving the
>>> problem of substitution and eviction.
>>>
>>> 1. Size adjustment BEFORE doing the substitution.
>>> This has several edge case problems as James showed and would lead to
>>> complex implementations.
>>>
>>> 2. Size adjustment AFTER doing the substitution.
>>> The problem is that the new entry may be dropped just after adding it.
>>>
>>> 3. Size adjustment BEFORE doing the substitution with pinning of the
>>> replaced entry (Roberto's proposal).
>>>
>>>
>>> I think that 2 and 3 are in fact close together.
>>> With 2, a bad encoder could just have its new entry dropped from the
>>> table just after adding it. However a good encoder could use an algorithm
>>> like the one propose in 3 to find the right entry to replace and prevent
>>> dropping the new entry.
>>> On the decoder side, 2 is simpler.
>>>
>>> Taking James' example for what a "good" encoder should do.
>>> With existing table (max size 15)
>>>   0: FOO = 123
>>>   1: BAR = 321
>>>   2: BA = Z
>>>
>>> The encoder wants to substitute: TESTING = FOO at Index #0 (because
>>> entry #0 is old or whatever).
>>> It detects that index #0 and #1 need to be dropped, and therefore adjust
>>> the substitution to replace Index #2.
>>> It then can apply the substitution, and after that adjust the size.
>>>
>>>
>>> So my preference is for the second solution. True, it can lead to
>>> under-optimal usage of the header table. But I'm not in favor of making all
>>> implementations more complex to help optimize bad implementations.
>>> We should however warn implementers of this problem.
>>>
>>> Hervé.
>>>
>>>
>>> > -----Original Message-----
>>> > From: James M Snell [mailto:jasnell@gmail.com]
>>> > Sent: mercredi 3 juillet 2013 02:36
>>> > To: Roberto Peon
>>> > Cc: ietf-http-wg@w3.org
>>> > Subject: Re: Header Compression Clarifications
>>> >
>>> > Yes, I was simplifying :-) I think that rule should work..
>>> > particularly in that it allows me to avoid having to check for as many
>>> of these
>>> > weird edge cases.
>>> >
>>> > On Tue, Jul 2, 2013 at 5:27 PM, Roberto Peon <grmocg@gmail.com> wrote:
>>> > > Correct (assuming the overhead per item was assumed to be zero, which
>>> > > isn't the case, but is good in example-land :) )
>>> > >
>>> > > -=R
>>> > >
>>> > >
>>> > > On Tue, Jul 2, 2013 at 5:18 PM, James M Snell <jasnell@gmail.com>
>>> wrote:
>>> > >>
>>> > >> So to make sure I have it right... Given the two examples I gave...
>>> > >>
>>> > >>   Header Table, Max size = 15
>>> > >>
>>> > >>   1  A = B
>>> > >>   2  C = D
>>> > >>   3  E = F
>>> > >>   4  G = H
>>> > >>   5  I = J
>>> > >>
>>> > >>   Substitute #5 with FOOBARBAZ = 123456
>>> > >>
>>> > >> The result would be a Header table with one item "FOOBARBAZ =
>>> 123456"
>>> > >>
>>> > >> And...
>>> > >>
>>> > >>   Header Table, Max size = 20
>>> > >>
>>> > >>   1  A = B
>>> > >>   2  C = D
>>> > >>   3  E = F
>>> > >>   4  G = H
>>> > >>   5  I = J
>>> > >>   6  K = L
>>> > >>   7  M = N
>>> > >>
>>> > >>   Substitute #3 with FOOBARBAZ = 123456
>>> > >>
>>> > >> The result would be a Header table with three items...
>>> > >>
>>> > >>   FOOBARBAZ = 123456
>>> > >>   K = L
>>> > >>   M = N
>>> > >>
>>> > >> Is that correct?
>>> > >>
>>> > >> On Tue, Jul 2, 2013 at 5:07 PM, Roberto Peon <grmocg@gmail.com>
>>> > wrote:
>>> > >> > The biggest reason that I don't like this is that it requires the
>>> > >> > encoder keep more state.
>>> > >> > I prefer to make this simple by having an easy-to-follow rule for
>>> > >> > when it the slot it would have replaced would have been evicted
>>> > >> > (once all predecessors to that slot have been evicted, then
>>> > >> > elements following the element-to-be-replaced are removed, leaving
>>> > >> > the new element at the head of the list).
>>> > >> >
>>> > >> > The pseudo code for this is:
>>> > >> >
>>> > >> > if not replacement_idx or new_element_size > max_table_size:
>>> > >> >   PROTOCOL_ERROR()
>>> > >> > if max_table_size ==new_element_size:
>>> > >> >   table.clear()
>>> > >> >   table[0] = new_element
>>> > >> >   return
>>> > >> >
>>> > >> > # above is boilerplate true for any algorithm
>>> > >> >
>>> > >> > table[replacement_idx].clear()
>>> > >> > table[replacement_idx].pin()
>>> > >> > first_non_pinned = 0
>>> > >> > while new_element_size + table_byte_size() > max_table_size:
>>> > >> >     if table[first_non_pinned].pinned():
>>> > >> >       ++first_non_pinned
>>> > >> >        continue
>>> > >> >       table[first_non_pinned].pop()
>>> > >> >
>>> > >> > This adds some small complexity here, but it makes encoding
>>> > >> > significantly easier (you can have a naive encoder which leaps
>>> > >> > without looking, which is far less complicated than having to look
>>> > >> > before leap, and may still prove reasonable in terms of compressor
>>> > >> > efficiency).
>>> > >> >
>>> > >> > I admit that I'm attracted to your idea. I just am afraid of what
>>> > >> > it makes the encoder look like :) -=R
>>> > >> >
>>> > >> >
>>> > >> > On Tue, Jul 2, 2013 at 4:37 PM, James M Snell <jasnell@gmail.com>
>>> > wrote:
>>> > >> >>
>>> > >> >> On Tue, Jul 2, 2013 at 4:00 PM, Roberto Peon <grmocg@gmail.com>
>>> > wrote:
>>> > >> >> [snip]
>>> > >> >> >
>>> > >> >> > So, an example:
>>> > >> >> > Imagine that you're replacing entry #10 with something 10
>>> > >> >> > characters long.
>>> > >> >> > The previous entry in that slot was 5 characters long, and the
>>> > >> >> > table was already at max size.
>>> > >> >> > This implies that you need to get rid of 5 characters before
>>> > >> >> > replacing.
>>> > >> >> > Assuming that items 1 and 2 are the oldest items and item 1 is
>>> 3
>>> > >> >> > chars, and item 2 is 3 chars, you need to pop two.
>>> > >> >> >
>>> > >> >> > You now stick the 10 characters into what was formerly entry
>>> #10.
>>> > >> >> >[snip]
>>> > >> >>
>>> > >> >> That's problematic too. Let's go back to my example:
>>> > >> >>
>>> > >> >> Header Table, Max size = 15
>>> > >> >>
>>> > >> >> 1  A = B
>>> > >> >> 2  C = D
>>> > >> >> 3  E = F
>>> > >> >> 4  G = H
>>> > >> >> 5  I = J
>>> > >> >>
>>> > >> >> Substitute #5 with FOOBARBAZ = 123456
>>> > >> >>
>>> > >> >> Obviously, we end up popping all five entries, saying "stick the
>>> > >> >> new characters into what was formerly entry #5" does not make any
>>> > >> >> sense because the thing that was "formerly entry #5" no longer
>>> exists.
>>> > >> >>
>>> > >> >> Now a variation on the same problem:
>>> > >> >>
>>> > >> >> Header Table, Max size = 20
>>> > >> >>
>>> > >> >> 1  A = B
>>> > >> >> 2  C = D
>>> > >> >> 3  E = F
>>> > >> >> 4  G = H
>>> > >> >> 5  I = J
>>> > >> >> 6  K = L
>>> > >> >> 7  M = N
>>> > >> >>
>>> > >> >> Substitute #3 with FOOBARBAZ = 123456
>>> > >> >>
>>> > >> >> We begin popping things off to make room before doing the
>>> > >> >> substitution... 4 entries are removed, including the item being
>>> > >> >> replaced... leaving
>>> > >> >>
>>> > >> >> 1  I = J
>>> > >> >> 2  K = L
>>> > >> >> 3  M = N
>>> > >> >>
>>> > >> >> What exactly do we replace? Are we replacing "M = N" (the current
>>> > #3)?
>>> > >> >> If so, how does that sync up with the "thing that was formerly
>>> > >> >> entry #3" idea?
>>> > >> >>
>>> > >> >> I think the only reliable approach is to substitute AFTER freeing
>>> > >> >> up space, substitute into whatever is in the index position after
>>> > >> >> freeing up space, and if nothing is in that space, return an
>>> > >> >> error. This means that the sender has to be careful to avoid
>>> > >> >> getting into this state in the first place, which means very
>>> > >> >> careful control over when and how substitution is being used.
>>> > >> >> Given the current eviction strategy, that would be the most
>>> > >> >> reliable approach I think. So in the two examples above, the
>>> first
>>> > >> >> case returns an error and the second case results in "M = N"
>>> being
>>> > replaced.
>>> > >> >
>>> > >> >
>>> > >
>>> > >
>>>
>>>
>>
>
Received on Friday, 5 July 2013 02:45:09 UTC