RE: Header Compression Clarifications from RUELLAN Herve on 2013-07-04 (ietf-http-wg@w3.org from July to September 2013)

From: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
Date: Thu, 4 Jul 2013 14:06:41 +0000
To: Michael Sweet <msweet@apple.com>
CC: James M Snell <jasnell@gmail.com>, Roberto Peon <grmocg@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <6C71876BDCCD01488E70A2399529D5E525EE4018@ADELE.crf.canon.fr>
I'm not sure I understand correctly your comment.

One of the goal of the eviction mechanism is to provide a solution to the possible problem of header table fragmentation over long-lasting connections: if the header table is full of very small entries, then substituting a long entry to one of these small entries will make the header table grow over its maximum allowed size. The eviction mechanism enable to allow for such a substitution while still keeping the header table under its maximum allowed size.

Hervé.

> -----Original Message-----
> From: Michael Sweet [mailto:msweet@apple.com]
> Sent: jeudi 4 juillet 2013 15:48
> To: RUELLAN Herve
> Cc: James M Snell; Roberto Peon; ietf-http-wg@w3.org
> Subject: Re: Header Compression Clarifications
> 
> What about simply not allowing indexing for name/value pairs that exceed
> SETTINGS_MAX_BUFFER_SIZE?  Both sides could still send "long" literal
> name/value pairs but they would not disrupt the header table.  When
> combined with #2 or #3 that should yield consistent and useful results.
> 
> 
> On Jul 4, 2013, at 9:09 AM, RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
> wrote:
> 
> > Trying to catch on the discussion, I see three proposals for solving the
> problem of substitution and eviction.
> >
> > 1. Size adjustment BEFORE doing the substitution.
> > This has several edge case problems as James showed and would lead to
> complex implementations.
> >
> > 2. Size adjustment AFTER doing the substitution.
> > The problem is that the new entry may be dropped just after adding it.
> >
> > 3. Size adjustment BEFORE doing the substitution with pinning of the
> replaced entry (Roberto's proposal).
> >
> >
> > I think that 2 and 3 are in fact close together.
> > With 2, a bad encoder could just have its new entry dropped from the table
> just after adding it. However a good encoder could use an algorithm like the
> one propose in 3 to find the right entry to replace and prevent dropping the
> new entry.
> > On the decoder side, 2 is simpler.
> >
> > Taking James' example for what a "good" encoder should do.
> > With existing table (max size 15)
> >  0: FOO = 123
> >  1: BAR = 321
> >  2: BA = Z
> >
> > The encoder wants to substitute: TESTING = FOO at Index #0 (because
> entry #0 is old or whatever).
> > It detects that index #0 and #1 need to be dropped, and therefore adjust
> the substitution to replace Index #2.
> > It then can apply the substitution, and after that adjust the size.
> >
> >
> > So my preference is for the second solution. True, it can lead to under-
> optimal usage of the header table. But I'm not in favor of making all
> implementations more complex to help optimize bad implementations.
> > We should however warn implementers of this problem.
> >
> > Hervé.
> >
> >
> >> -----Original Message-----
> >> From: James M Snell [mailto:jasnell@gmail.com]
> >> Sent: mercredi 3 juillet 2013 02:36
> >> To: Roberto Peon
> >> Cc: ietf-http-wg@w3.org
> >> Subject: Re: Header Compression Clarifications
> >>
> >> Yes, I was simplifying :-) I think that rule should work..
> >> particularly in that it allows me to avoid having to check for as
> >> many of these weird edge cases.
> >>
> >> On Tue, Jul 2, 2013 at 5:27 PM, Roberto Peon <grmocg@gmail.com>
> wrote:
> >>> Correct (assuming the overhead per item was assumed to be zero,
> >>> which isn't the case, but is good in example-land :) )
> >>>
> >>> -=R
> >>>
> >>>
> >>> On Tue, Jul 2, 2013 at 5:18 PM, James M Snell <jasnell@gmail.com>
> wrote:
> >>>>
> >>>> So to make sure I have it right... Given the two examples I gave...
> >>>>
> >>>>  Header Table, Max size = 15
> >>>>
> >>>>  1  A = B
> >>>>  2  C = D
> >>>>  3  E = F
> >>>>  4  G = H
> >>>>  5  I = J
> >>>>
> >>>>  Substitute #5 with FOOBARBAZ = 123456
> >>>>
> >>>> The result would be a Header table with one item "FOOBARBAZ =
> 123456"
> >>>>
> >>>> And...
> >>>>
> >>>>  Header Table, Max size = 20
> >>>>
> >>>>  1  A = B
> >>>>  2  C = D
> >>>>  3  E = F
> >>>>  4  G = H
> >>>>  5  I = J
> >>>>  6  K = L
> >>>>  7  M = N
> >>>>
> >>>>  Substitute #3 with FOOBARBAZ = 123456
> >>>>
> >>>> The result would be a Header table with three items...
> >>>>
> >>>>  FOOBARBAZ = 123456
> >>>>  K = L
> >>>>  M = N
> >>>>
> >>>> Is that correct?
> >>>>
> >>>> On Tue, Jul 2, 2013 at 5:07 PM, Roberto Peon <grmocg@gmail.com>
> >> wrote:
> >>>>> The biggest reason that I don't like this is that it requires the
> >>>>> encoder keep more state.
> >>>>> I prefer to make this simple by having an easy-to-follow rule for
> >>>>> when it the slot it would have replaced would have been evicted
> >>>>> (once all predecessors to that slot have been evicted, then
> >>>>> elements following the element-to-be-replaced are removed, leaving
> >>>>> the new element at the head of the list).
> >>>>>
> >>>>> The pseudo code for this is:
> >>>>>
> >>>>> if not replacement_idx or new_element_size > max_table_size:
> >>>>>  PROTOCOL_ERROR()
> >>>>> if max_table_size ==new_element_size:
> >>>>>  table.clear()
> >>>>>  table[0] = new_element
> >>>>>  return
> >>>>>
> >>>>> # above is boilerplate true for any algorithm
> >>>>>
> >>>>> table[replacement_idx].clear()
> >>>>> table[replacement_idx].pin()
> >>>>> first_non_pinned = 0
> >>>>> while new_element_size + table_byte_size() > max_table_size:
> >>>>>    if table[first_non_pinned].pinned():
> >>>>>      ++first_non_pinned
> >>>>>       continue
> >>>>>      table[first_non_pinned].pop()
> >>>>>
> >>>>> This adds some small complexity here, but it makes encoding
> >>>>> significantly easier (you can have a naive encoder which leaps
> >>>>> without looking, which is far less complicated than having to look
> >>>>> before leap, and may still prove reasonable in terms of compressor
> >>>>> efficiency).
> >>>>>
> >>>>> I admit that I'm attracted to your idea. I just am afraid of what
> >>>>> it makes the encoder look like :) -=R
> >>>>>
> >>>>>
> >>>>> On Tue, Jul 2, 2013 at 4:37 PM, James M Snell <jasnell@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>> On Tue, Jul 2, 2013 at 4:00 PM, Roberto Peon <grmocg@gmail.com>
> >> wrote:
> >>>>>> [snip]
> >>>>>>>
> >>>>>>> So, an example:
> >>>>>>> Imagine that you're replacing entry #10 with something 10
> >>>>>>> characters long.
> >>>>>>> The previous entry in that slot was 5 characters long, and the
> >>>>>>> table was already at max size.
> >>>>>>> This implies that you need to get rid of 5 characters before
> >>>>>>> replacing.
> >>>>>>> Assuming that items 1 and 2 are the oldest items and item 1 is 3
> >>>>>>> chars, and item 2 is 3 chars, you need to pop two.
> >>>>>>>
> >>>>>>> You now stick the 10 characters into what was formerly entry #10.
> >>>>>>> [snip]
> >>>>>>
> >>>>>> That's problematic too. Let's go back to my example:
> >>>>>>
> >>>>>> Header Table, Max size = 15
> >>>>>>
> >>>>>> 1  A = B
> >>>>>> 2  C = D
> >>>>>> 3  E = F
> >>>>>> 4  G = H
> >>>>>> 5  I = J
> >>>>>>
> >>>>>> Substitute #5 with FOOBARBAZ = 123456
> >>>>>>
> >>>>>> Obviously, we end up popping all five entries, saying "stick the
> >>>>>> new characters into what was formerly entry #5" does not make any
> >>>>>> sense because the thing that was "formerly entry #5" no longer
> exists.
> >>>>>>
> >>>>>> Now a variation on the same problem:
> >>>>>>
> >>>>>> Header Table, Max size = 20
> >>>>>>
> >>>>>> 1  A = B
> >>>>>> 2  C = D
> >>>>>> 3  E = F
> >>>>>> 4  G = H
> >>>>>> 5  I = J
> >>>>>> 6  K = L
> >>>>>> 7  M = N
> >>>>>>
> >>>>>> Substitute #3 with FOOBARBAZ = 123456
> >>>>>>
> >>>>>> We begin popping things off to make room before doing the
> >>>>>> substitution... 4 entries are removed, including the item being
> >>>>>> replaced... leaving
> >>>>>>
> >>>>>> 1  I = J
> >>>>>> 2  K = L
> >>>>>> 3  M = N
> >>>>>>
> >>>>>> What exactly do we replace? Are we replacing "M = N" (the current
> >> #3)?
> >>>>>> If so, how does that sync up with the "thing that was formerly
> >>>>>> entry #3" idea?
> >>>>>>
> >>>>>> I think the only reliable approach is to substitute AFTER freeing
> >>>>>> up space, substitute into whatever is in the index position after
> >>>>>> freeing up space, and if nothing is in that space, return an
> >>>>>> error. This means that the sender has to be careful to avoid
> >>>>>> getting into this state in the first place, which means very
> >>>>>> careful control over when and how substitution is being used.
> >>>>>> Given the current eviction strategy, that would be the most
> >>>>>> reliable approach I think. So in the two examples above, the
> >>>>>> first case returns an error and the second case results in "M =
> >>>>>> N" being
> >> replaced.
> >>>>>
> >>>>>
> >>>
> >>>
> >
> 
> _________________________________________________________
> Michael Sweet, Senior Printing System Engineer, PWG Chair
Received on Thursday, 4 July 2013 14:07:21 UTC