Re: Header Compression Overview

This does not adapt well to key-values which change quickly or are small.
-=R


On Mon, Jul 1, 2013 at 11:21 PM, James M Snell <jasnell@gmail.com> wrote:

> After implementing several variations,  I want something even simpler...
> A fixed index range,  assigned incrementally with rollover at the end and
> no substitution. It becomes very simple that way.
> On Jul 1, 2013 8:29 PM, "Mike Bishop" <Michael.Bishop@microsoft.com>
> wrote:
>
>> Yes -- we just need to clarify the eviction behavior.  I actually prefer
>> the evict-from-front model, because it means there's no need for metadata
>> to include a timestamp on each entry in the table.  You know which item
>> you're going to evict, because it's clear.
>>
>> -----Original Message-----
>> From: James M Snell [mailto:jasnell@gmail.com]
>> Sent: Monday, July 1, 2013 5:41 PM
>> To: Mike Bishop
>> Cc: ietf-http-wg@w3.org
>> Subject: Re: Header Compression Overview
>>
>> Ok, so I'm taking another look through the current header compression
>> draft and I'd swear that it said some things differently the last time I
>> read it ;-) ... (which is, of course, impossible... so I'm going to blame
>> it on the 110+ degree heat here today and the fact that I've read so many
>> different versions of header compression drafts over the last few months
>> that my specific recollection of any individual version is a bit gooey in
>> the middle). So.. with that said, allow me to make a few corrections to the
>> explanation... Let's see if this is a bit closer to reality:
>>
>> The sender and receiver each maintain their own synchronized view of the
>> compression context.
>>
>> The compression context consists of two components:
>>
>>   1. A dynamic table of name+value pairs
>>   2. The current "reference set" of name+value pairs
>>
>> When the connection is first established the reference set (#2) is empty
>> and the header table is prepopulated.
>>
>> The sender constructs a logical set of header name+value pairs, where
>> each name can have multiple values. Each distinct name+value combination is
>> treated independently of the others.
>>
>> For each name+value pair in the set, the sender needs to determine
>> whether or not to use an Indexed or Literal representation. The Indexed
>> representation is appropriate if the name+value pair already exists in
>> either the header table (#1). The Literal representation is appropriate if
>> the name+value pair either does not exist in #1, or if the sender does not
>> want the name+value pair stored in the compression context (more on this
>> part later)
>>
>> For example, let's assume that the header table initially consists of
>> three name+value pairs:
>>
>>   0 = :method = GET
>>   1 = :method = POST
>>   2 = :path = /
>>
>> Then, let's assume that I've created a new connection and the first set
>> of headers I send to the server contains the header fields:
>>
>>   :method = GET
>>   :path = /
>>   foo1 = bar
>>   foo2 = baz
>>
>> When I process this set of header fields, I first check each pair against
>> the header table and find that :method = GET uses the index
>> (0) while :path = / uses the index (2). Since I don't find foo1 or
>> foo2 in the static table, I determine that I have to use the Literal
>> representation for those two fields.
>>
>>   :method = GET  ==>  Indexed as (0)
>>   :path = / ==> Indexed as (2)
>>   foo1 = bar ==> Literal
>>   foo2 = bar ==> Literal
>>
>> Now, still on the sending side, once I determine the kind of
>> representation, I need to determine what changes need to be made to the
>> reference set (#2). The reference set is what the sender and receiver both
>> use to determine what set of header fields are currently active. Since this
>> is a brand new connection, the reference set is empty, so with this first
>> set of headers, the changes to the reference set are straightforward... I'm
>> adding everything...
>>
>> To add an Indexed representation of a header, all I do is reference it's
>> index.. that's easy enough to do...
>>
>> To add a Literal representation of a header, there are two options:
>> With Indexing, or Without Indexing. With Indexing means that when the
>> header field is added to the Reference set (#2), the name+value pair is
>> also added to the header table (#1). Without indexing means that when the
>> header field is added to the Reference set (#2) the
>> name+value pair is NOT added to the dynamic table. Which of these to
>> use to entirely up to the sender. So I'm going to decide that foo1 is
>> going to be With Indexing, while foo2 is going to be Without Indexing.
>>
>> (there is another option here between incremental vs. substitution
>> indexing but I'll get to that in a few minutes)
>>
>> At this point, I have a set of instructions of how to modify the
>> reference set:
>>
>>   1. Add Index #0
>>   2. Add Index #2
>>   3. Add Literal "foo1 = bar" with Indexing
>>   4. Add Literal "foo2 = baz" without Indexing
>>
>> At this point, I can create the serialized header block that will be
>> transmitted within the HEADERS frame. This block ONLY contains the
>> serialized set of instructions for modifying the reference set.
>>
>> For instructions 1 and 2 (adding index #0 and index #2), the
>> serialization is very simple and requires exactly two bytes (0x81 0x83). (I
>> have to bump up the index values by 1 when serializing for some reason)
>>
>> For instruction #3, it's slightly more complicated, but not much... we
>> have a single prefix byte (0x40) followed by the length prefixed header
>> name (0x04 0x67 0x6F 0x6F 0x31) and the length prefixed header value (0x03
>> 0x62 0x61 0x72).
>>
>> For instruction #4, the encoding is only slightly different. The single
>> prefix byte is 0x60 followed by the length prefixed header name
>> (0x04 0x67 0x6F 0x6F 0x32) and the length prefixed value (0x03 0x62
>> 0x61 0x7A).
>>
>> Putting it all together, the complete sequence of octets that I will send
>> within the HEADERS frame is:
>>
>>  0x81 0x83 0x40 0x04
>>  0x67 0x6F 0x6F 0x31
>>  0x03 0x62 0x61 0x72
>>  0x60 0x04 0x67 0x6F
>>  0x6F 0x32 0x03 0x62
>>  0x61 0x7A
>>
>> Once I generate this sequence, a new entry is added to the header table
>> (#1):
>>
>>   3 = foo1 = bar
>>
>> And the reference set (#2) has been updated to include the four headers I
>> sent:
>>
>>   :method = GET
>>   :path = /
>>   foo1 = bar
>>   foo2 = baz
>>
>> (this is a simplification actually but it's close enough for now)
>>
>> I send the serialized instructions to the other endpoint. It receives
>> those and reverses the process first by parsing the octet sequence to
>> determine the exact set of instructions it needs to follow to modify it's
>> own view of the reference set. Once it knows what the operations are, it
>> begins making the necessary changes to the reference set.
>> First, it sees two Index representations, so it looks those index values
>> up in the header table (#1). It finds Index #0 and Index #2 in the static
>> table so it adds those name+value pairs to it's reference set. Next, it
>> finds the Literal With Indexing instruction for "foo1 = bar". Since it is
>> "With Indexing", the server adds the name+value pair to the reference set
>> *AND* to the header table. Next it finds the Literal Without Indexing
>> instructions for "foo2 = baz". Since it is "Without Indexing", the server
>> only adds the name+value pair to the reference set and notes the fact that
>> the header was not indexed at all.
>>
>> At this point, the sender and receiver should have identical views of
>> both the header table and reference set.
>>
>> So far so good.
>>
>> Let's move on... back on the sending side. Now I want to send another
>> request to the server. Some of the header fields remain the same, some have
>> different values, some are no longer used at all.
>>
>>   :method = POST
>>   :path = /
>>   foo1 = new
>>   foo3 = new
>>
>> When serializing these headers, I'm going to follow the same process, but
>> since our header table and reference set are both different, the serialized
>> results are going to be a bit different.
>>
>> Going through each of the headers, I first determine that :method = POST
>> is already contained in the header table at Index #1. I also note that
>> :path = / is also in the header table using Index #2 (this hasn't changed).
>>
>> Also looking at the header table, I see that there is a "foo1" header
>> with a different value sitting at Index #3. I make note of that index
>> position so I can use it next.
>>
>> Lastly, I determine that the new header field "foo3 = new" is not in
>> header table at all. I'm going to have to use the Literal representation
>> for that one.
>>
>> Once I've determined the basic type of representation, I need to
>> determine how the current reference set needs to be changed. As a reminder,
>> here's what the current reference set looks like:
>>
>>   :method = GET
>>   :path = /
>>   foo1 = bar
>>   foo2 = baz
>>
>> Immediately, I see that :method = GET needs to be replaced with :method =
>> POST, since we know the static index positions for both of those, we can
>> simple turn one off by referencing it's index and turn the other on (also
>> by referencing it's index). That requires a simple two-byte sequence (0x81
>> 0x82).
>>
>> I see that :path = / is already in the reference set. So for that one, I
>> don't have to do anything! Awesome. Let's move on.
>>
>> We have an index position for the header field foo1 but not the full
>> name+value pair "foo1=bar". Remember when I briefly mentioned
>> Incremental Indexing vs. Substitution indexing? We need to make a
>> decision. When Incremental Indexing, I can add "foo1 = new" as an entirely
>> new entry in the dynamic table (#2) or I can replace the value of the
>> existing "foo1 = bar" entry sitting at index #3. The choice is entirely up
>> to the sender and can be based on a number of factors... the most important
>> of which is basic memory management. For now, let's keep things simple and
>> use Incremental Indexing. We want to minimize the amount of data we're
>> going to send, however, so instead of sending the header field name as a
>> length prefixed string, we're going to reference the existing index
>> position and send along a new value ( 0x44 0x03 0x6E 0x65 0x77 ). (again, I
>> have to bump up the value of index value by 1)
>>
>> Moving on, we determine that "foo2 = baz" needs to be removed from the
>> reference set. We see, however, that "foo2 = baz" was previously sent as a
>> Literal without any indexing. This means that it will automatically be
>> removed from the reference set the next time around without the sender
>> having to do anything! That's helpful.
>>
>> Now all we need to deal with is the new "foo2 = new" header field, we'll
>> do that using Literal With Indexing, like we did originally with the "foo1
>> = bar" header field in the first request..  0x40 0x04 0x67 0x6F 0x6F 0x33
>> 0x03 0x6E 0x65 0x77.
>>
>> The complete sequence of octets for the second request is:
>>
>>   0x81 0x82 0x44 0x03
>>   0x6E 0x65 0x77 0x40
>>   0x04 0x67 0x6F 0x6F
>>   0x33 0x03 0x6E 0x65
>>   0x77
>>
>> After I construct this sequence, my header table has five entries:
>>
>>   0 = :method = GET
>>   1 = :method = POST
>>   2 = :path = /
>>   3 = foo1 = bar
>>   4 = foo1 = new
>>
>> And my reference set looks like:
>>
>>   :method = POST
>>   :path = /
>>   foo1 = new
>>   foo3 = new
>>
>> I send the HEADERS frame to the server and it deconstructs the set of
>> instructions, applies those to it's own reference set, updates it's header
>> table, and goes on from there.
>>
>> The process can get a bit complicated at times, but that's the basic
>> operation. There are a few other things to consider:
>>
>> The size of the dynamic table can be bound by the recipient. The sender
>> is not permitted to exceed that size. If adding a new item to the dynamic
>> table will cause the table to grow beyond this limit, existing name+value
>> pairs need to be removed, starting with the
>> name+value pairs that were least-recently written to the table.
>>
>> (((Ed. Note: We need clarity on this next part: )))
>>
>> That sounds straightforward, however, given the distinction between
>> Incremental and Substitution indexing, the Least Recently Written rule can
>> become a bit complicated in the implementation. Using Substitution indexing
>> resets the Write time for that particular index, meaning that eviction does
>> not necessary occur in Index order.
>>
>> For example, let's suppose that I send one HEADERS frame that
>> incrementally adds three items to the dynamic table. Let's call those #1,
>> #2 and #3. Later, I send a new HEADERS that substitutes a new value for
>> Index #2. Last, I send a new HEADERS frame that adds two new headers for #4
>> and #5. The only problem is, adding #4 and #5 to the dynamic table will
>> cause it to exceed it's size limit, so I need to evict some of the existing
>> entries. Because of the substitution of #2, Items #1 and #3 are the first
>> to be evicted (in that order). So the dynamic table contents become #2, #4,
>> and #5).
>>
>> (((Ed. Note: Or is it.... that Substitution DOES NOT change the
>> Least-Recently-Written eviction order? In which case, a more recently
>> substituted value can be dropped before a less-least-recently written
>> Incremental value? That doesn't feel right to me...)))
>>
>> Eviction needs to be handled carefully with regards to the current
>> reference set. Suppose, for instance, that in one request, I add "foo1 =
>> bar" to the header table and to the current reference set. In the *same
>> request*, I add "foo2 = baz" to the header table and to the current
>> reference set, however, adding this second item causes the header table to
>> exceed it's defined size, forcing the previously written "foo1 = bar" to be
>> evicted...
>>
>> On the receiving end, I'll see the instruction to add "foo1 = bar"
>> first, which will add it to the reference set and the header table, then
>> I'll see the instruction to add "foo2 = baz", which adds it to the
>> reference set and the header table, but evicts "foo1 = bar" from the
>> dynamic table. Despite this eviction, both "foo1 = bar" and "foo2 = baz"
>> are both STILL in the reference set.
>>
>> The next time around, "foo1 = bar" ends up being treated as a non-indexed
>> literal and is dropped automatically from the reference set if it is not
>> explicitly mentioned in the new set of headers.
>>
>> Another consideration that must be carefully weighed is the choice
>> between Incremental and Substitution indexing. If an implementation always
>> chooses to use Incremental Indexing, Eviction will become the only reliable
>> means of memory management. Long lived connections will build up full
>> compression contexts that tie up a potentially significant amount of
>> memory. This could be dangerous, especially since there is no explicit
>> mechanism for deleting items from the header table.
>>
>> Substitution Indexing provides only a partial solution to this problem by
>> making it possible to reuse existing index positions. The problem with
>> using Substitution indexing, however, is that it's not clear yet how it
>> impact the LRU eviction semantics (see the Ed. Note above)
>>
>> ...
>>
>> Ok, I *think* that's a more accurate description. Do folks agree?
>>
>> - James
>>
>>
>>

Received on Tuesday, 2 July 2013 06:27:08 UTC