Re: Header Compression Clarifications from Michael Sweet on 2013-07-04 (ietf-http-wg@w3.org from July to September 2013)

From: Michael Sweet <msweet@apple.com>
Date: Thu, 04 Jul 2013 10:47:12 -0400
To: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
Cc: James M Snell <jasnell@gmail.com>, Roberto Peon <grmocg@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-id: <DD65C562-BFE6-4789-879D-200BC016F6E7@apple.com>
Hervé,

On Jul 4, 2013, at 10:06 AM, RUELLAN Herve <Herve.Ruellan@crf.canon.fr> wrote:
> I'm not sure I understand correctly your comment.
> 
> One of the goal of the eviction mechanism is to provide a solution to the possible problem of header table fragmentation over long-lasting connections: if the header table is full of very small entries, then substituting a long entry to one of these small entries will make the header table grow over its maximum allowed size. The eviction mechanism enable to allow for such a substitution while still keeping the header table under its maximum allowed size.

Right, I am just addressing the "I want to add a name/value pair that is 100 bytes long but the header table size is 50 bytes." - what happens if you try substitution/incremental indexing with a name/value pair that is larger than the header table?

We *could* just say that doing so empties the header table and doesn't add the pair to it (because it won't fit), but I don't think that is particularly useful.

Instead I am suggesting that such name/value pairs must always use the Literal Header without Indexing representation - that allows for header name reuse (e.g. cookie) and otherwise preserves the header table.  Otherwise I see certain headers (e.g. cookie) effectively killing any benefits of the header compression scheme.


> 
> Hervé.
> 
>> -----Original Message-----
>> From: Michael Sweet [mailto:msweet@apple.com]
>> Sent: jeudi 4 juillet 2013 15:48
>> To: RUELLAN Herve
>> Cc: James M Snell; Roberto Peon; ietf-http-wg@w3.org
>> Subject: Re: Header Compression Clarifications
>> 
>> What about simply not allowing indexing for name/value pairs that exceed
>> SETTINGS_MAX_BUFFER_SIZE?  Both sides could still send "long" literal
>> name/value pairs but they would not disrupt the header table.  When
>> combined with #2 or #3 that should yield consistent and useful results.
>> 
>> 
>> On Jul 4, 2013, at 9:09 AM, RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
>> wrote:
>> 
>>> Trying to catch on the discussion, I see three proposals for solving the
>> problem of substitution and eviction.
>>> 
>>> 1. Size adjustment BEFORE doing the substitution.
>>> This has several edge case problems as James showed and would lead to
>> complex implementations.
>>> 
>>> 2. Size adjustment AFTER doing the substitution.
>>> The problem is that the new entry may be dropped just after adding it.
>>> 
>>> 3. Size adjustment BEFORE doing the substitution with pinning of the
>> replaced entry (Roberto's proposal).
>>> 
>>> 
>>> I think that 2 and 3 are in fact close together.
>>> With 2, a bad encoder could just have its new entry dropped from the table
>> just after adding it. However a good encoder could use an algorithm like the
>> one propose in 3 to find the right entry to replace and prevent dropping the
>> new entry.
>>> On the decoder side, 2 is simpler.
>>> 
>>> Taking James' example for what a "good" encoder should do.
>>> With existing table (max size 15)
>>> 0: FOO = 123
>>> 1: BAR = 321
>>> 2: BA = Z
>>> 
>>> The encoder wants to substitute: TESTING = FOO at Index #0 (because
>> entry #0 is old or whatever).
>>> It detects that index #0 and #1 need to be dropped, and therefore adjust
>> the substitution to replace Index #2.
>>> It then can apply the substitution, and after that adjust the size.
>>> 
>>> 
>>> So my preference is for the second solution. True, it can lead to under-
>> optimal usage of the header table. But I'm not in favor of making all
>> implementations more complex to help optimize bad implementations.
>>> We should however warn implementers of this problem.
>>> 
>>> Hervé.
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: James M Snell [mailto:jasnell@gmail.com]
>>>> Sent: mercredi 3 juillet 2013 02:36
>>>> To: Roberto Peon
>>>> Cc: ietf-http-wg@w3.org
>>>> Subject: Re: Header Compression Clarifications
>>>> 
>>>> Yes, I was simplifying :-) I think that rule should work..
>>>> particularly in that it allows me to avoid having to check for as
>>>> many of these weird edge cases.
>>>> 
>>>> On Tue, Jul 2, 2013 at 5:27 PM, Roberto Peon <grmocg@gmail.com>
>> wrote:
>>>>> Correct (assuming the overhead per item was assumed to be zero,
>>>>> which isn't the case, but is good in example-land :) )
>>>>> 
>>>>> -=R
>>>>> 
>>>>> 
>>>>> On Tue, Jul 2, 2013 at 5:18 PM, James M Snell <jasnell@gmail.com>
>> wrote:
>>>>>> 
>>>>>> So to make sure I have it right... Given the two examples I gave...
>>>>>> 
>>>>>> Header Table, Max size = 15
>>>>>> 
>>>>>> 1  A = B
>>>>>> 2  C = D
>>>>>> 3  E = F
>>>>>> 4  G = H
>>>>>> 5  I = J
>>>>>> 
>>>>>> Substitute #5 with FOOBARBAZ = 123456
>>>>>> 
>>>>>> The result would be a Header table with one item "FOOBARBAZ =
>> 123456"
>>>>>> 
>>>>>> And...
>>>>>> 
>>>>>> Header Table, Max size = 20
>>>>>> 
>>>>>> 1  A = B
>>>>>> 2  C = D
>>>>>> 3  E = F
>>>>>> 4  G = H
>>>>>> 5  I = J
>>>>>> 6  K = L
>>>>>> 7  M = N
>>>>>> 
>>>>>> Substitute #3 with FOOBARBAZ = 123456
>>>>>> 
>>>>>> The result would be a Header table with three items...
>>>>>> 
>>>>>> FOOBARBAZ = 123456
>>>>>> K = L
>>>>>> M = N
>>>>>> 
>>>>>> Is that correct?
>>>>>> 
>>>>>> On Tue, Jul 2, 2013 at 5:07 PM, Roberto Peon <grmocg@gmail.com>
>>>> wrote:
>>>>>>> The biggest reason that I don't like this is that it requires the
>>>>>>> encoder keep more state.
>>>>>>> I prefer to make this simple by having an easy-to-follow rule for
>>>>>>> when it the slot it would have replaced would have been evicted
>>>>>>> (once all predecessors to that slot have been evicted, then
>>>>>>> elements following the element-to-be-replaced are removed, leaving
>>>>>>> the new element at the head of the list).
>>>>>>> 
>>>>>>> The pseudo code for this is:
>>>>>>> 
>>>>>>> if not replacement_idx or new_element_size > max_table_size:
>>>>>>> PROTOCOL_ERROR()
>>>>>>> if max_table_size ==new_element_size:
>>>>>>> table.clear()
>>>>>>> table[0] = new_element
>>>>>>> return
>>>>>>> 
>>>>>>> # above is boilerplate true for any algorithm
>>>>>>> 
>>>>>>> table[replacement_idx].clear()
>>>>>>> table[replacement_idx].pin()
>>>>>>> first_non_pinned = 0
>>>>>>> while new_element_size + table_byte_size() > max_table_size:
>>>>>>>   if table[first_non_pinned].pinned():
>>>>>>>     ++first_non_pinned
>>>>>>>      continue
>>>>>>>     table[first_non_pinned].pop()
>>>>>>> 
>>>>>>> This adds some small complexity here, but it makes encoding
>>>>>>> significantly easier (you can have a naive encoder which leaps
>>>>>>> without looking, which is far less complicated than having to look
>>>>>>> before leap, and may still prove reasonable in terms of compressor
>>>>>>> efficiency).
>>>>>>> 
>>>>>>> I admit that I'm attracted to your idea. I just am afraid of what
>>>>>>> it makes the encoder look like :) -=R
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Jul 2, 2013 at 4:37 PM, James M Snell <jasnell@gmail.com>
>>>> wrote:
>>>>>>>> 
>>>>>>>> On Tue, Jul 2, 2013 at 4:00 PM, Roberto Peon <grmocg@gmail.com>
>>>> wrote:
>>>>>>>> [snip]
>>>>>>>>> 
>>>>>>>>> So, an example:
>>>>>>>>> Imagine that you're replacing entry #10 with something 10
>>>>>>>>> characters long.
>>>>>>>>> The previous entry in that slot was 5 characters long, and the
>>>>>>>>> table was already at max size.
>>>>>>>>> This implies that you need to get rid of 5 characters before
>>>>>>>>> replacing.
>>>>>>>>> Assuming that items 1 and 2 are the oldest items and item 1 is 3
>>>>>>>>> chars, and item 2 is 3 chars, you need to pop two.
>>>>>>>>> 
>>>>>>>>> You now stick the 10 characters into what was formerly entry #10.
>>>>>>>>> [snip]
>>>>>>>> 
>>>>>>>> That's problematic too. Let's go back to my example:
>>>>>>>> 
>>>>>>>> Header Table, Max size = 15
>>>>>>>> 
>>>>>>>> 1  A = B
>>>>>>>> 2  C = D
>>>>>>>> 3  E = F
>>>>>>>> 4  G = H
>>>>>>>> 5  I = J
>>>>>>>> 
>>>>>>>> Substitute #5 with FOOBARBAZ = 123456
>>>>>>>> 
>>>>>>>> Obviously, we end up popping all five entries, saying "stick the
>>>>>>>> new characters into what was formerly entry #5" does not make any
>>>>>>>> sense because the thing that was "formerly entry #5" no longer
>> exists.
>>>>>>>> 
>>>>>>>> Now a variation on the same problem:
>>>>>>>> 
>>>>>>>> Header Table, Max size = 20
>>>>>>>> 
>>>>>>>> 1  A = B
>>>>>>>> 2  C = D
>>>>>>>> 3  E = F
>>>>>>>> 4  G = H
>>>>>>>> 5  I = J
>>>>>>>> 6  K = L
>>>>>>>> 7  M = N
>>>>>>>> 
>>>>>>>> Substitute #3 with FOOBARBAZ = 123456
>>>>>>>> 
>>>>>>>> We begin popping things off to make room before doing the
>>>>>>>> substitution... 4 entries are removed, including the item being
>>>>>>>> replaced... leaving
>>>>>>>> 
>>>>>>>> 1  I = J
>>>>>>>> 2  K = L
>>>>>>>> 3  M = N
>>>>>>>> 
>>>>>>>> What exactly do we replace? Are we replacing "M = N" (the current
>>>> #3)?
>>>>>>>> If so, how does that sync up with the "thing that was formerly
>>>>>>>> entry #3" idea?
>>>>>>>> 
>>>>>>>> I think the only reliable approach is to substitute AFTER freeing
>>>>>>>> up space, substitute into whatever is in the index position after
>>>>>>>> freeing up space, and if nothing is in that space, return an
>>>>>>>> error. This means that the sender has to be careful to avoid
>>>>>>>> getting into this state in the first place, which means very
>>>>>>>> careful control over when and how substitution is being used.
>>>>>>>> Given the current eviction strategy, that would be the most
>>>>>>>> reliable approach I think. So in the two examples above, the
>>>>>>>> first case returns an error and the second case results in "M =
>>>>>>>> N" being
>>>> replaced.
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>> 
>> _________________________________________________________
>> Michael Sweet, Senior Printing System Engineer, PWG Chair
> 

_________________________________________________________
Michael Sweet, Senior Printing System Engineer, PWG Chair
Received on Thursday, 4 July 2013 14:47:54 UTC