Re: Header Compression Clarifications from Michael Sweet on 2013-07-04 (ietf-http-wg@w3.org from July to September 2013)

From: Michael Sweet <msweet@apple.com>
Date: Thu, 04 Jul 2013 09:47:50 -0400
To: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
Cc: James M Snell <jasnell@gmail.com>, Roberto Peon <grmocg@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-id: <B537C783-AEB8-4439-A549-0EEE10E27A3D@apple.com>
What about simply not allowing indexing for name/value pairs that exceed SETTINGS_MAX_BUFFER_SIZE?  Both sides could still send "long" literal name/value pairs but they would not disrupt the header table.  When combined with #2 or #3 that should yield consistent and useful results.


On Jul 4, 2013, at 9:09 AM, RUELLAN Herve <Herve.Ruellan@crf.canon.fr> wrote:

> Trying to catch on the discussion, I see three proposals for solving the problem of substitution and eviction.
> 
> 1. Size adjustment BEFORE doing the substitution.
> This has several edge case problems as James showed and would lead to complex implementations.
> 
> 2. Size adjustment AFTER doing the substitution.
> The problem is that the new entry may be dropped just after adding it.
> 
> 3. Size adjustment BEFORE doing the substitution with pinning of the replaced entry (Roberto's proposal).
> 
> 
> I think that 2 and 3 are in fact close together.
> With 2, a bad encoder could just have its new entry dropped from the table just after adding it. However a good encoder could use an algorithm like the one propose in 3 to find the right entry to replace and prevent dropping the new entry.
> On the decoder side, 2 is simpler.
> 
> Taking James' example for what a "good" encoder should do.
> With existing table (max size 15)
>  0: FOO = 123
>  1: BAR = 321
>  2: BA = Z
> 
> The encoder wants to substitute: TESTING = FOO at Index #0 (because entry #0 is old or whatever).
> It detects that index #0 and #1 need to be dropped, and therefore adjust the substitution to replace Index #2.
> It then can apply the substitution, and after that adjust the size.
> 
> 
> So my preference is for the second solution. True, it can lead to under-optimal usage of the header table. But I'm not in favor of making all implementations more complex to help optimize bad implementations.
> We should however warn implementers of this problem.
> 
> Hervé.
> 
> 
>> -----Original Message-----
>> From: James M Snell [mailto:jasnell@gmail.com]
>> Sent: mercredi 3 juillet 2013 02:36
>> To: Roberto Peon
>> Cc: ietf-http-wg@w3.org
>> Subject: Re: Header Compression Clarifications
>> 
>> Yes, I was simplifying :-) I think that rule should work..
>> particularly in that it allows me to avoid having to check for as many of these
>> weird edge cases.
>> 
>> On Tue, Jul 2, 2013 at 5:27 PM, Roberto Peon <grmocg@gmail.com> wrote:
>>> Correct (assuming the overhead per item was assumed to be zero, which
>>> isn't the case, but is good in example-land :) )
>>> 
>>> -=R
>>> 
>>> 
>>> On Tue, Jul 2, 2013 at 5:18 PM, James M Snell <jasnell@gmail.com> wrote:
>>>> 
>>>> So to make sure I have it right... Given the two examples I gave...
>>>> 
>>>>  Header Table, Max size = 15
>>>> 
>>>>  1  A = B
>>>>  2  C = D
>>>>  3  E = F
>>>>  4  G = H
>>>>  5  I = J
>>>> 
>>>>  Substitute #5 with FOOBARBAZ = 123456
>>>> 
>>>> The result would be a Header table with one item "FOOBARBAZ = 123456"
>>>> 
>>>> And...
>>>> 
>>>>  Header Table, Max size = 20
>>>> 
>>>>  1  A = B
>>>>  2  C = D
>>>>  3  E = F
>>>>  4  G = H
>>>>  5  I = J
>>>>  6  K = L
>>>>  7  M = N
>>>> 
>>>>  Substitute #3 with FOOBARBAZ = 123456
>>>> 
>>>> The result would be a Header table with three items...
>>>> 
>>>>  FOOBARBAZ = 123456
>>>>  K = L
>>>>  M = N
>>>> 
>>>> Is that correct?
>>>> 
>>>> On Tue, Jul 2, 2013 at 5:07 PM, Roberto Peon <grmocg@gmail.com>
>> wrote:
>>>>> The biggest reason that I don't like this is that it requires the
>>>>> encoder keep more state.
>>>>> I prefer to make this simple by having an easy-to-follow rule for
>>>>> when it the slot it would have replaced would have been evicted
>>>>> (once all predecessors to that slot have been evicted, then
>>>>> elements following the element-to-be-replaced are removed, leaving
>>>>> the new element at the head of the list).
>>>>> 
>>>>> The pseudo code for this is:
>>>>> 
>>>>> if not replacement_idx or new_element_size > max_table_size:
>>>>>  PROTOCOL_ERROR()
>>>>> if max_table_size ==new_element_size:
>>>>>  table.clear()
>>>>>  table[0] = new_element
>>>>>  return
>>>>> 
>>>>> # above is boilerplate true for any algorithm
>>>>> 
>>>>> table[replacement_idx].clear()
>>>>> table[replacement_idx].pin()
>>>>> first_non_pinned = 0
>>>>> while new_element_size + table_byte_size() > max_table_size:
>>>>>    if table[first_non_pinned].pinned():
>>>>>      ++first_non_pinned
>>>>>       continue
>>>>>      table[first_non_pinned].pop()
>>>>> 
>>>>> This adds some small complexity here, but it makes encoding
>>>>> significantly easier (you can have a naive encoder which leaps
>>>>> without looking, which is far less complicated than having to look
>>>>> before leap, and may still prove reasonable in terms of compressor
>>>>> efficiency).
>>>>> 
>>>>> I admit that I'm attracted to your idea. I just am afraid of what
>>>>> it makes the encoder look like :) -=R
>>>>> 
>>>>> 
>>>>> On Tue, Jul 2, 2013 at 4:37 PM, James M Snell <jasnell@gmail.com>
>> wrote:
>>>>>> 
>>>>>> On Tue, Jul 2, 2013 at 4:00 PM, Roberto Peon <grmocg@gmail.com>
>> wrote:
>>>>>> [snip]
>>>>>>> 
>>>>>>> So, an example:
>>>>>>> Imagine that you're replacing entry #10 with something 10
>>>>>>> characters long.
>>>>>>> The previous entry in that slot was 5 characters long, and the
>>>>>>> table was already at max size.
>>>>>>> This implies that you need to get rid of 5 characters before
>>>>>>> replacing.
>>>>>>> Assuming that items 1 and 2 are the oldest items and item 1 is 3
>>>>>>> chars, and item 2 is 3 chars, you need to pop two.
>>>>>>> 
>>>>>>> You now stick the 10 characters into what was formerly entry #10.
>>>>>>> [snip]
>>>>>> 
>>>>>> That's problematic too. Let's go back to my example:
>>>>>> 
>>>>>> Header Table, Max size = 15
>>>>>> 
>>>>>> 1  A = B
>>>>>> 2  C = D
>>>>>> 3  E = F
>>>>>> 4  G = H
>>>>>> 5  I = J
>>>>>> 
>>>>>> Substitute #5 with FOOBARBAZ = 123456
>>>>>> 
>>>>>> Obviously, we end up popping all five entries, saying "stick the
>>>>>> new characters into what was formerly entry #5" does not make any
>>>>>> sense because the thing that was "formerly entry #5" no longer exists.
>>>>>> 
>>>>>> Now a variation on the same problem:
>>>>>> 
>>>>>> Header Table, Max size = 20
>>>>>> 
>>>>>> 1  A = B
>>>>>> 2  C = D
>>>>>> 3  E = F
>>>>>> 4  G = H
>>>>>> 5  I = J
>>>>>> 6  K = L
>>>>>> 7  M = N
>>>>>> 
>>>>>> Substitute #3 with FOOBARBAZ = 123456
>>>>>> 
>>>>>> We begin popping things off to make room before doing the
>>>>>> substitution... 4 entries are removed, including the item being
>>>>>> replaced... leaving
>>>>>> 
>>>>>> 1  I = J
>>>>>> 2  K = L
>>>>>> 3  M = N
>>>>>> 
>>>>>> What exactly do we replace? Are we replacing "M = N" (the current
>> #3)?
>>>>>> If so, how does that sync up with the "thing that was formerly
>>>>>> entry #3" idea?
>>>>>> 
>>>>>> I think the only reliable approach is to substitute AFTER freeing
>>>>>> up space, substitute into whatever is in the index position after
>>>>>> freeing up space, and if nothing is in that space, return an
>>>>>> error. This means that the sender has to be careful to avoid
>>>>>> getting into this state in the first place, which means very
>>>>>> careful control over when and how substitution is being used.
>>>>>> Given the current eviction strategy, that would be the most
>>>>>> reliable approach I think. So in the two examples above, the first
>>>>>> case returns an error and the second case results in "M = N" being
>> replaced.
>>>>> 
>>>>> 
>>> 
>>> 
> 

_________________________________________________________
Michael Sweet, Senior Printing System Engineer, PWG Chair
Received on Thursday, 4 July 2013 13:48:23 UTC