Re: Header Compression Clarifications from Michael Sweet on 2013-07-04 (ietf-http-wg@w3.org from July to September 2013)

From: Michael Sweet <msweet@apple.com>
Date: Thu, 04 Jul 2013 11:08:57 -0400
To: James M Snell <jasnell@gmail.com>
Cc: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>, Roberto Peon <grmocg@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-id: <E295F403-8EC7-4946-A61B-970002DF8EB2@apple.com>
I agree, but we don't talk about errors in the current draft nor do we talk about what the client or server does to recover...  As a first step it would be useful to say what you do for such headers...


On Jul 4, 2013, at 10:56 AM, James M Snell <jasnell@gmail.com> wrote:
> Adding a header that is too large for the table to hold individually after eviction ought to be a compression error.
> 
> On Jul 4, 2013 7:47 AM, "Michael Sweet" <msweet@apple.com> wrote:
> Hervé,
> 
> On Jul 4, 2013, at 10:06 AM, RUELLAN Herve <Herve.Ruellan@crf.canon.fr> wrote:
> > I'm not sure I understand correctly your comment.
> >
> > One of the goal of the eviction mechanism is to provide a solution to the possible problem of header table fragmentation over long-lasting connections: if the header table is full of very small entries, then substituting a long entry to one of these small entries will make the header table grow over its maximum allowed size. The eviction mechanism enable to allow for such a substitution while still keeping the header table under its maximum allowed size.
> 
> Right, I am just addressing the "I want to add a name/value pair that is 100 bytes long but the header table size is 50 bytes." - what happens if you try substitution/incremental indexing with a name/value pair that is larger than the header table?
> 
> We *could* just say that doing so empties the header table and doesn't add the pair to it (because it won't fit), but I don't think that is particularly useful.
> 
> Instead I am suggesting that such name/value pairs must always use the Literal Header without Indexing representation - that allows for header name reuse (e.g. cookie) and otherwise preserves the header table.  Otherwise I see certain headers (e.g. cookie) effectively killing any benefits of the header compression scheme.
> 
> 
> >
> > Hervé.
> >
> >> -----Original Message-----
> >> From: Michael Sweet [mailto:msweet@apple.com]
> >> Sent: jeudi 4 juillet 2013 15:48
> >> To: RUELLAN Herve
> >> Cc: James M Snell; Roberto Peon; ietf-http-wg@w3.org
> >> Subject: Re: Header Compression Clarifications
> >>
> >> What about simply not allowing indexing for name/value pairs that exceed
> >> SETTINGS_MAX_BUFFER_SIZE?  Both sides could still send "long" literal
> >> name/value pairs but they would not disrupt the header table.  When
> >> combined with #2 or #3 that should yield consistent and useful results.
> >>
> >>
> >> On Jul 4, 2013, at 9:09 AM, RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
> >> wrote:
> >>
> >>> Trying to catch on the discussion, I see three proposals for solving the
> >> problem of substitution and eviction.
> >>>
> >>> 1. Size adjustment BEFORE doing the substitution.
> >>> This has several edge case problems as James showed and would lead to
> >> complex implementations.
> >>>
> >>> 2. Size adjustment AFTER doing the substitution.
> >>> The problem is that the new entry may be dropped just after adding it.
> >>>
> >>> 3. Size adjustment BEFORE doing the substitution with pinning of the
> >> replaced entry (Roberto's proposal).
> >>>
> >>>
> >>> I think that 2 and 3 are in fact close together.
> >>> With 2, a bad encoder could just have its new entry dropped from the table
> >> just after adding it. However a good encoder could use an algorithm like the
> >> one propose in 3 to find the right entry to replace and prevent dropping the
> >> new entry.
> >>> On the decoder side, 2 is simpler.
> >>>
> >>> Taking James' example for what a "good" encoder should do.
> >>> With existing table (max size 15)
> >>> 0: FOO = 123
> >>> 1: BAR = 321
> >>> 2: BA = Z
> >>>
> >>> The encoder wants to substitute: TESTING = FOO at Index #0 (because
> >> entry #0 is old or whatever).
> >>> It detects that index #0 and #1 need to be dropped, and therefore adjust
> >> the substitution to replace Index #2.
> >>> It then can apply the substitution, and after that adjust the size.
> >>>
> >>>
> >>> So my preference is for the second solution. True, it can lead to under-
> >> optimal usage of the header table. But I'm not in favor of making all
> >> implementations more complex to help optimize bad implementations.
> >>> We should however warn implementers of this problem.
> >>>
> >>> Hervé.
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: James M Snell [mailto:jasnell@gmail.com]
> >>>> Sent: mercredi 3 juillet 2013 02:36
> >>>> To: Roberto Peon
> >>>> Cc: ietf-http-wg@w3.org
> >>>> Subject: Re: Header Compression Clarifications
> >>>>
> >>>> Yes, I was simplifying :-) I think that rule should work..
> >>>> particularly in that it allows me to avoid having to check for as
> >>>> many of these weird edge cases.
> >>>>
> >>>> On Tue, Jul 2, 2013 at 5:27 PM, Roberto Peon <grmocg@gmail.com>
> >> wrote:
> >>>>> Correct (assuming the overhead per item was assumed to be zero,
> >>>>> which isn't the case, but is good in example-land :) )
> >>>>>
> >>>>> -=R
> >>>>>
> >>>>>
> >>>>> On Tue, Jul 2, 2013 at 5:18 PM, James M Snell <jasnell@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>> So to make sure I have it right... Given the two examples I gave...
> >>>>>>
> >>>>>> Header Table, Max size = 15
> >>>>>>
> >>>>>> 1  A = B
> >>>>>> 2  C = D
> >>>>>> 3  E = F
> >>>>>> 4  G = H
> >>>>>> 5  I = J
> >>>>>>
> >>>>>> Substitute #5 with FOOBARBAZ = 123456
> >>>>>>
> >>>>>> The result would be a Header table with one item "FOOBARBAZ =
> >> 123456"
> >>>>>>
> >>>>>> And...
> >>>>>>
> >>>>>> Header Table, Max size = 20
> >>>>>>
> >>>>>> 1  A = B
> >>>>>> 2  C = D
> >>>>>> 3  E = F
> >>>>>> 4  G = H
> >>>>>> 5  I = J
> >>>>>> 6  K = L
> >>>>>> 7  M = N
> >>>>>>
> >>>>>> Substitute #3 with FOOBARBAZ = 123456
> >>>>>>
> >>>>>> The result would be a Header table with three items...
> >>>>>>
> >>>>>> FOOBARBAZ = 123456
> >>>>>> K = L
> >>>>>> M = N
> >>>>>>
> >>>>>> Is that correct?
> >>>>>>
> >>>>>> On Tue, Jul 2, 2013 at 5:07 PM, Roberto Peon <grmocg@gmail.com>
> >>>> wrote:
> >>>>>>> The biggest reason that I don't like this is that it requires the
> >>>>>>> encoder keep more state.
> >>>>>>> I prefer to make this simple by having an easy-to-follow rule for
> >>>>>>> when it the slot it would have replaced would have been evicted
> >>>>>>> (once all predecessors to that slot have been evicted, then
> >>>>>>> elements following the element-to-be-replaced are removed, leaving
> >>>>>>> the new element at the head of the list).
> >>>>>>>
> >>>>>>> The pseudo code for this is:
> >>>>>>>
> >>>>>>> if not replacement_idx or new_element_size > max_table_size:
> >>>>>>> PROTOCOL_ERROR()
> >>>>>>> if max_table_size ==new_element_size:
> >>>>>>> table.clear()
> >>>>>>> table[0] = new_element
> >>>>>>> return
> >>>>>>>
> >>>>>>> # above is boilerplate true for any algorithm
> >>>>>>>
> >>>>>>> table[replacement_idx].clear()
> >>>>>>> table[replacement_idx].pin()
> >>>>>>> first_non_pinned = 0
> >>>>>>> while new_element_size + table_byte_size() > max_table_size:
> >>>>>>>   if table[first_non_pinned].pinned():
> >>>>>>>     ++first_non_pinned
> >>>>>>>      continue
> >>>>>>>     table[first_non_pinned].pop()
> >>>>>>>
> >>>>>>> This adds some small complexity here, but it makes encoding
> >>>>>>> significantly easier (you can have a naive encoder which leaps
> >>>>>>> without looking, which is far less complicated than having to look
> >>>>>>> before leap, and may still prove reasonable in terms of compressor
> >>>>>>> efficiency).
> >>>>>>>
> >>>>>>> I admit that I'm attracted to your idea. I just am afraid of what
> >>>>>>> it makes the encoder look like :) -=R
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Jul 2, 2013 at 4:37 PM, James M Snell <jasnell@gmail.com>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>> On Tue, Jul 2, 2013 at 4:00 PM, Roberto Peon <grmocg@gmail.com>
> >>>> wrote:
> >>>>>>>> [snip]
> >>>>>>>>>
> >>>>>>>>> So, an example:
> >>>>>>>>> Imagine that you're replacing entry #10 with something 10
> >>>>>>>>> characters long.
> >>>>>>>>> The previous entry in that slot was 5 characters long, and the
> >>>>>>>>> table was already at max size.
> >>>>>>>>> This implies that you need to get rid of 5 characters before
> >>>>>>>>> replacing.
> >>>>>>>>> Assuming that items 1 and 2 are the oldest items and item 1 is 3
> >>>>>>>>> chars, and item 2 is 3 chars, you need to pop two.
> >>>>>>>>>
> >>>>>>>>> You now stick the 10 characters into what was formerly entry #10.
> >>>>>>>>> [snip]
> >>>>>>>>
> >>>>>>>> That's problematic too. Let's go back to my example:
> >>>>>>>>
> >>>>>>>> Header Table, Max size = 15
> >>>>>>>>
> >>>>>>>> 1  A = B
> >>>>>>>> 2  C = D
> >>>>>>>> 3  E = F
> >>>>>>>> 4  G = H
> >>>>>>>> 5  I = J
> >>>>>>>>
> >>>>>>>> Substitute #5 with FOOBARBAZ = 123456
> >>>>>>>>
> >>>>>>>> Obviously, we end up popping all five entries, saying "stick the
> >>>>>>>> new characters into what was formerly entry #5" does not make any
> >>>>>>>> sense because the thing that was "formerly entry #5" no longer
> >> exists.
> >>>>>>>>
> >>>>>>>> Now a variation on the same problem:
> >>>>>>>>
> >>>>>>>> Header Table, Max size = 20
> >>>>>>>>
> >>>>>>>> 1  A = B
> >>>>>>>> 2  C = D
> >>>>>>>> 3  E = F
> >>>>>>>> 4  G = H
> >>>>>>>> 5  I = J
> >>>>>>>> 6  K = L
> >>>>>>>> 7  M = N
> >>>>>>>>
> >>>>>>>> Substitute #3 with FOOBARBAZ = 123456
> >>>>>>>>
> >>>>>>>> We begin popping things off to make room before doing the
> >>>>>>>> substitution... 4 entries are removed, including the item being
> >>>>>>>> replaced... leaving
> >>>>>>>>
> >>>>>>>> 1  I = J
> >>>>>>>> 2  K = L
> >>>>>>>> 3  M = N
> >>>>>>>>
> >>>>>>>> What exactly do we replace? Are we replacing "M = N" (the current
> >>>> #3)?
> >>>>>>>> If so, how does that sync up with the "thing that was formerly
> >>>>>>>> entry #3" idea?
> >>>>>>>>
> >>>>>>>> I think the only reliable approach is to substitute AFTER freeing
> >>>>>>>> up space, substitute into whatever is in the index position after
> >>>>>>>> freeing up space, and if nothing is in that space, return an
> >>>>>>>> error. This means that the sender has to be careful to avoid
> >>>>>>>> getting into this state in the first place, which means very
> >>>>>>>> careful control over when and how substitution is being used.
> >>>>>>>> Given the current eviction strategy, that would be the most
> >>>>>>>> reliable approach I think. So in the two examples above, the
> >>>>>>>> first case returns an error and the second case results in "M =
> >>>>>>>> N" being
> >>>> replaced.
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>
> >> _________________________________________________________
> >> Michael Sweet, Senior Printing System Engineer, PWG Chair
> >
> 
> _________________________________________________________
> Michael Sweet, Senior Printing System Engineer, PWG Chair
> 

_________________________________________________________
Michael Sweet, Senior Printing System Engineer, PWG Chair
Received on Thursday, 4 July 2013 15:09:46 UTC