RE: Header Compression Clarifications

Trying to catch on the discussion, I see three proposals for solving the problem of substitution and eviction.

1. Size adjustment BEFORE doing the substitution.
This has several edge case problems as James showed and would lead to complex implementations.

2. Size adjustment AFTER doing the substitution.
The problem is that the new entry may be dropped just after adding it.

3. Size adjustment BEFORE doing the substitution with pinning of the replaced entry (Roberto's proposal).


I think that 2 and 3 are in fact close together.
With 2, a bad encoder could just have its new entry dropped from the table just after adding it. However a good encoder could use an algorithm like the one propose in 3 to find the right entry to replace and prevent dropping the new entry.
On the decoder side, 2 is simpler.

Taking James' example for what a "good" encoder should do.
With existing table (max size 15)
  0: FOO = 123
  1: BAR = 321
  2: BA = Z

The encoder wants to substitute: TESTING = FOO at Index #0 (because entry #0 is old or whatever).
It detects that index #0 and #1 need to be dropped, and therefore adjust the substitution to replace Index #2.
It then can apply the substitution, and after that adjust the size.


So my preference is for the second solution. True, it can lead to under-optimal usage of the header table. But I'm not in favor of making all implementations more complex to help optimize bad implementations.
We should however warn implementers of this problem.

Hervé.


> -----Original Message-----
> From: James M Snell [mailto:jasnell@gmail.com]
> Sent: mercredi 3 juillet 2013 02:36
> To: Roberto Peon
> Cc: ietf-http-wg@w3.org
> Subject: Re: Header Compression Clarifications
> 
> Yes, I was simplifying :-) I think that rule should work..
> particularly in that it allows me to avoid having to check for as many of these
> weird edge cases.
> 
> On Tue, Jul 2, 2013 at 5:27 PM, Roberto Peon <grmocg@gmail.com> wrote:
> > Correct (assuming the overhead per item was assumed to be zero, which
> > isn't the case, but is good in example-land :) )
> >
> > -=R
> >
> >
> > On Tue, Jul 2, 2013 at 5:18 PM, James M Snell <jasnell@gmail.com> wrote:
> >>
> >> So to make sure I have it right... Given the two examples I gave...
> >>
> >>   Header Table, Max size = 15
> >>
> >>   1  A = B
> >>   2  C = D
> >>   3  E = F
> >>   4  G = H
> >>   5  I = J
> >>
> >>   Substitute #5 with FOOBARBAZ = 123456
> >>
> >> The result would be a Header table with one item "FOOBARBAZ = 123456"
> >>
> >> And...
> >>
> >>   Header Table, Max size = 20
> >>
> >>   1  A = B
> >>   2  C = D
> >>   3  E = F
> >>   4  G = H
> >>   5  I = J
> >>   6  K = L
> >>   7  M = N
> >>
> >>   Substitute #3 with FOOBARBAZ = 123456
> >>
> >> The result would be a Header table with three items...
> >>
> >>   FOOBARBAZ = 123456
> >>   K = L
> >>   M = N
> >>
> >> Is that correct?
> >>
> >> On Tue, Jul 2, 2013 at 5:07 PM, Roberto Peon <grmocg@gmail.com>
> wrote:
> >> > The biggest reason that I don't like this is that it requires the
> >> > encoder keep more state.
> >> > I prefer to make this simple by having an easy-to-follow rule for
> >> > when it the slot it would have replaced would have been evicted
> >> > (once all predecessors to that slot have been evicted, then
> >> > elements following the element-to-be-replaced are removed, leaving
> >> > the new element at the head of the list).
> >> >
> >> > The pseudo code for this is:
> >> >
> >> > if not replacement_idx or new_element_size > max_table_size:
> >> >   PROTOCOL_ERROR()
> >> > if max_table_size ==new_element_size:
> >> >   table.clear()
> >> >   table[0] = new_element
> >> >   return
> >> >
> >> > # above is boilerplate true for any algorithm
> >> >
> >> > table[replacement_idx].clear()
> >> > table[replacement_idx].pin()
> >> > first_non_pinned = 0
> >> > while new_element_size + table_byte_size() > max_table_size:
> >> >     if table[first_non_pinned].pinned():
> >> >       ++first_non_pinned
> >> >        continue
> >> >       table[first_non_pinned].pop()
> >> >
> >> > This adds some small complexity here, but it makes encoding
> >> > significantly easier (you can have a naive encoder which leaps
> >> > without looking, which is far less complicated than having to look
> >> > before leap, and may still prove reasonable in terms of compressor
> >> > efficiency).
> >> >
> >> > I admit that I'm attracted to your idea. I just am afraid of what
> >> > it makes the encoder look like :) -=R
> >> >
> >> >
> >> > On Tue, Jul 2, 2013 at 4:37 PM, James M Snell <jasnell@gmail.com>
> wrote:
> >> >>
> >> >> On Tue, Jul 2, 2013 at 4:00 PM, Roberto Peon <grmocg@gmail.com>
> wrote:
> >> >> [snip]
> >> >> >
> >> >> > So, an example:
> >> >> > Imagine that you're replacing entry #10 with something 10
> >> >> > characters long.
> >> >> > The previous entry in that slot was 5 characters long, and the
> >> >> > table was already at max size.
> >> >> > This implies that you need to get rid of 5 characters before
> >> >> > replacing.
> >> >> > Assuming that items 1 and 2 are the oldest items and item 1 is 3
> >> >> > chars, and item 2 is 3 chars, you need to pop two.
> >> >> >
> >> >> > You now stick the 10 characters into what was formerly entry #10.
> >> >> >[snip]
> >> >>
> >> >> That's problematic too. Let's go back to my example:
> >> >>
> >> >> Header Table, Max size = 15
> >> >>
> >> >> 1  A = B
> >> >> 2  C = D
> >> >> 3  E = F
> >> >> 4  G = H
> >> >> 5  I = J
> >> >>
> >> >> Substitute #5 with FOOBARBAZ = 123456
> >> >>
> >> >> Obviously, we end up popping all five entries, saying "stick the
> >> >> new characters into what was formerly entry #5" does not make any
> >> >> sense because the thing that was "formerly entry #5" no longer exists.
> >> >>
> >> >> Now a variation on the same problem:
> >> >>
> >> >> Header Table, Max size = 20
> >> >>
> >> >> 1  A = B
> >> >> 2  C = D
> >> >> 3  E = F
> >> >> 4  G = H
> >> >> 5  I = J
> >> >> 6  K = L
> >> >> 7  M = N
> >> >>
> >> >> Substitute #3 with FOOBARBAZ = 123456
> >> >>
> >> >> We begin popping things off to make room before doing the
> >> >> substitution... 4 entries are removed, including the item being
> >> >> replaced... leaving
> >> >>
> >> >> 1  I = J
> >> >> 2  K = L
> >> >> 3  M = N
> >> >>
> >> >> What exactly do we replace? Are we replacing "M = N" (the current
> #3)?
> >> >> If so, how does that sync up with the "thing that was formerly
> >> >> entry #3" idea?
> >> >>
> >> >> I think the only reliable approach is to substitute AFTER freeing
> >> >> up space, substitute into whatever is in the index position after
> >> >> freeing up space, and if nothing is in that space, return an
> >> >> error. This means that the sender has to be careful to avoid
> >> >> getting into this state in the first place, which means very
> >> >> careful control over when and how substitution is being used.
> >> >> Given the current eviction strategy, that would be the most
> >> >> reliable approach I think. So in the two examples above, the first
> >> >> case returns an error and the second case results in "M = N" being
> replaced.
> >> >
> >> >
> >
> >

Received on Thursday, 4 July 2013 13:10:11 UTC