- From: James M Snell <jasnell@gmail.com>
- Date: Mon, 1 Jul 2013 23:21:30 -0700
- To: Mike Bishop <Michael.Bishop@microsoft.com>
- Cc: ietf-http-wg@w3.org
- Message-ID: <CABP7RbfauE_w8ESUKS8xFYkH01Ax-2T85tuUSh35PkzqCvH_PQ@mail.gmail.com>
After implementing several variations, I want something even simpler... A fixed index range, assigned incrementally with rollover at the end and no substitution. It becomes very simple that way. On Jul 1, 2013 8:29 PM, "Mike Bishop" <Michael.Bishop@microsoft.com> wrote: > Yes -- we just need to clarify the eviction behavior. I actually prefer > the evict-from-front model, because it means there's no need for metadata > to include a timestamp on each entry in the table. You know which item > you're going to evict, because it's clear. > > -----Original Message----- > From: James M Snell [mailto:jasnell@gmail.com] > Sent: Monday, July 1, 2013 5:41 PM > To: Mike Bishop > Cc: ietf-http-wg@w3.org > Subject: Re: Header Compression Overview > > Ok, so I'm taking another look through the current header compression > draft and I'd swear that it said some things differently the last time I > read it ;-) ... (which is, of course, impossible... so I'm going to blame > it on the 110+ degree heat here today and the fact that I've read so many > different versions of header compression drafts over the last few months > that my specific recollection of any individual version is a bit gooey in > the middle). So.. with that said, allow me to make a few corrections to the > explanation... Let's see if this is a bit closer to reality: > > The sender and receiver each maintain their own synchronized view of the > compression context. > > The compression context consists of two components: > > 1. A dynamic table of name+value pairs > 2. The current "reference set" of name+value pairs > > When the connection is first established the reference set (#2) is empty > and the header table is prepopulated. > > The sender constructs a logical set of header name+value pairs, where each > name can have multiple values. Each distinct name+value combination is > treated independently of the others. > > For each name+value pair in the set, the sender needs to determine whether > or not to use an Indexed or Literal representation. The Indexed > representation is appropriate if the name+value pair already exists in > either the header table (#1). The Literal representation is appropriate if > the name+value pair either does not exist in #1, or if the sender does not > want the name+value pair stored in the compression context (more on this > part later) > > For example, let's assume that the header table initially consists of > three name+value pairs: > > 0 = :method = GET > 1 = :method = POST > 2 = :path = / > > Then, let's assume that I've created a new connection and the first set of > headers I send to the server contains the header fields: > > :method = GET > :path = / > foo1 = bar > foo2 = baz > > When I process this set of header fields, I first check each pair against > the header table and find that :method = GET uses the index > (0) while :path = / uses the index (2). Since I don't find foo1 or > foo2 in the static table, I determine that I have to use the Literal > representation for those two fields. > > :method = GET ==> Indexed as (0) > :path = / ==> Indexed as (2) > foo1 = bar ==> Literal > foo2 = bar ==> Literal > > Now, still on the sending side, once I determine the kind of > representation, I need to determine what changes need to be made to the > reference set (#2). The reference set is what the sender and receiver both > use to determine what set of header fields are currently active. Since this > is a brand new connection, the reference set is empty, so with this first > set of headers, the changes to the reference set are straightforward... I'm > adding everything... > > To add an Indexed representation of a header, all I do is reference it's > index.. that's easy enough to do... > > To add a Literal representation of a header, there are two options: > With Indexing, or Without Indexing. With Indexing means that when the > header field is added to the Reference set (#2), the name+value pair is > also added to the header table (#1). Without indexing means that when the > header field is added to the Reference set (#2) the > name+value pair is NOT added to the dynamic table. Which of these to > use to entirely up to the sender. So I'm going to decide that foo1 is > going to be With Indexing, while foo2 is going to be Without Indexing. > > (there is another option here between incremental vs. substitution > indexing but I'll get to that in a few minutes) > > At this point, I have a set of instructions of how to modify the reference > set: > > 1. Add Index #0 > 2. Add Index #2 > 3. Add Literal "foo1 = bar" with Indexing > 4. Add Literal "foo2 = baz" without Indexing > > At this point, I can create the serialized header block that will be > transmitted within the HEADERS frame. This block ONLY contains the > serialized set of instructions for modifying the reference set. > > For instructions 1 and 2 (adding index #0 and index #2), the serialization > is very simple and requires exactly two bytes (0x81 0x83). (I have to bump > up the index values by 1 when serializing for some reason) > > For instruction #3, it's slightly more complicated, but not much... we > have a single prefix byte (0x40) followed by the length prefixed header > name (0x04 0x67 0x6F 0x6F 0x31) and the length prefixed header value (0x03 > 0x62 0x61 0x72). > > For instruction #4, the encoding is only slightly different. The single > prefix byte is 0x60 followed by the length prefixed header name > (0x04 0x67 0x6F 0x6F 0x32) and the length prefixed value (0x03 0x62 > 0x61 0x7A). > > Putting it all together, the complete sequence of octets that I will send > within the HEADERS frame is: > > 0x81 0x83 0x40 0x04 > 0x67 0x6F 0x6F 0x31 > 0x03 0x62 0x61 0x72 > 0x60 0x04 0x67 0x6F > 0x6F 0x32 0x03 0x62 > 0x61 0x7A > > Once I generate this sequence, a new entry is added to the header table > (#1): > > 3 = foo1 = bar > > And the reference set (#2) has been updated to include the four headers I > sent: > > :method = GET > :path = / > foo1 = bar > foo2 = baz > > (this is a simplification actually but it's close enough for now) > > I send the serialized instructions to the other endpoint. It receives > those and reverses the process first by parsing the octet sequence to > determine the exact set of instructions it needs to follow to modify it's > own view of the reference set. Once it knows what the operations are, it > begins making the necessary changes to the reference set. > First, it sees two Index representations, so it looks those index values > up in the header table (#1). It finds Index #0 and Index #2 in the static > table so it adds those name+value pairs to it's reference set. Next, it > finds the Literal With Indexing instruction for "foo1 = bar". Since it is > "With Indexing", the server adds the name+value pair to the reference set > *AND* to the header table. Next it finds the Literal Without Indexing > instructions for "foo2 = baz". Since it is "Without Indexing", the server > only adds the name+value pair to the reference set and notes the fact that > the header was not indexed at all. > > At this point, the sender and receiver should have identical views of both > the header table and reference set. > > So far so good. > > Let's move on... back on the sending side. Now I want to send another > request to the server. Some of the header fields remain the same, some have > different values, some are no longer used at all. > > :method = POST > :path = / > foo1 = new > foo3 = new > > When serializing these headers, I'm going to follow the same process, but > since our header table and reference set are both different, the serialized > results are going to be a bit different. > > Going through each of the headers, I first determine that :method = POST > is already contained in the header table at Index #1. I also note that > :path = / is also in the header table using Index #2 (this hasn't changed). > > Also looking at the header table, I see that there is a "foo1" header with > a different value sitting at Index #3. I make note of that index position > so I can use it next. > > Lastly, I determine that the new header field "foo3 = new" is not in > header table at all. I'm going to have to use the Literal representation > for that one. > > Once I've determined the basic type of representation, I need to determine > how the current reference set needs to be changed. As a reminder, here's > what the current reference set looks like: > > :method = GET > :path = / > foo1 = bar > foo2 = baz > > Immediately, I see that :method = GET needs to be replaced with :method = > POST, since we know the static index positions for both of those, we can > simple turn one off by referencing it's index and turn the other on (also > by referencing it's index). That requires a simple two-byte sequence (0x81 > 0x82). > > I see that :path = / is already in the reference set. So for that one, I > don't have to do anything! Awesome. Let's move on. > > We have an index position for the header field foo1 but not the full > name+value pair "foo1=bar". Remember when I briefly mentioned > Incremental Indexing vs. Substitution indexing? We need to make a > decision. When Incremental Indexing, I can add "foo1 = new" as an entirely > new entry in the dynamic table (#2) or I can replace the value of the > existing "foo1 = bar" entry sitting at index #3. The choice is entirely up > to the sender and can be based on a number of factors... the most important > of which is basic memory management. For now, let's keep things simple and > use Incremental Indexing. We want to minimize the amount of data we're > going to send, however, so instead of sending the header field name as a > length prefixed string, we're going to reference the existing index > position and send along a new value ( 0x44 0x03 0x6E 0x65 0x77 ). (again, I > have to bump up the value of index value by 1) > > Moving on, we determine that "foo2 = baz" needs to be removed from the > reference set. We see, however, that "foo2 = baz" was previously sent as a > Literal without any indexing. This means that it will automatically be > removed from the reference set the next time around without the sender > having to do anything! That's helpful. > > Now all we need to deal with is the new "foo2 = new" header field, we'll > do that using Literal With Indexing, like we did originally with the "foo1 > = bar" header field in the first request.. 0x40 0x04 0x67 0x6F 0x6F 0x33 > 0x03 0x6E 0x65 0x77. > > The complete sequence of octets for the second request is: > > 0x81 0x82 0x44 0x03 > 0x6E 0x65 0x77 0x40 > 0x04 0x67 0x6F 0x6F > 0x33 0x03 0x6E 0x65 > 0x77 > > After I construct this sequence, my header table has five entries: > > 0 = :method = GET > 1 = :method = POST > 2 = :path = / > 3 = foo1 = bar > 4 = foo1 = new > > And my reference set looks like: > > :method = POST > :path = / > foo1 = new > foo3 = new > > I send the HEADERS frame to the server and it deconstructs the set of > instructions, applies those to it's own reference set, updates it's header > table, and goes on from there. > > The process can get a bit complicated at times, but that's the basic > operation. There are a few other things to consider: > > The size of the dynamic table can be bound by the recipient. The sender is > not permitted to exceed that size. If adding a new item to the dynamic > table will cause the table to grow beyond this limit, existing name+value > pairs need to be removed, starting with the > name+value pairs that were least-recently written to the table. > > (((Ed. Note: We need clarity on this next part: ))) > > That sounds straightforward, however, given the distinction between > Incremental and Substitution indexing, the Least Recently Written rule can > become a bit complicated in the implementation. Using Substitution indexing > resets the Write time for that particular index, meaning that eviction does > not necessary occur in Index order. > > For example, let's suppose that I send one HEADERS frame that > incrementally adds three items to the dynamic table. Let's call those #1, > #2 and #3. Later, I send a new HEADERS that substitutes a new value for > Index #2. Last, I send a new HEADERS frame that adds two new headers for #4 > and #5. The only problem is, adding #4 and #5 to the dynamic table will > cause it to exceed it's size limit, so I need to evict some of the existing > entries. Because of the substitution of #2, Items #1 and #3 are the first > to be evicted (in that order). So the dynamic table contents become #2, #4, > and #5). > > (((Ed. Note: Or is it.... that Substitution DOES NOT change the > Least-Recently-Written eviction order? In which case, a more recently > substituted value can be dropped before a less-least-recently written > Incremental value? That doesn't feel right to me...))) > > Eviction needs to be handled carefully with regards to the current > reference set. Suppose, for instance, that in one request, I add "foo1 = > bar" to the header table and to the current reference set. In the *same > request*, I add "foo2 = baz" to the header table and to the current > reference set, however, adding this second item causes the header table to > exceed it's defined size, forcing the previously written "foo1 = bar" to be > evicted... > > On the receiving end, I'll see the instruction to add "foo1 = bar" > first, which will add it to the reference set and the header table, then > I'll see the instruction to add "foo2 = baz", which adds it to the > reference set and the header table, but evicts "foo1 = bar" from the > dynamic table. Despite this eviction, both "foo1 = bar" and "foo2 = baz" > are both STILL in the reference set. > > The next time around, "foo1 = bar" ends up being treated as a non-indexed > literal and is dropped automatically from the reference set if it is not > explicitly mentioned in the new set of headers. > > Another consideration that must be carefully weighed is the choice between > Incremental and Substitution indexing. If an implementation always chooses > to use Incremental Indexing, Eviction will become the only reliable means > of memory management. Long lived connections will build up full compression > contexts that tie up a potentially significant amount of memory. This could > be dangerous, especially since there is no explicit mechanism for deleting > items from the header table. > > Substitution Indexing provides only a partial solution to this problem by > making it possible to reuse existing index positions. The problem with > using Substitution indexing, however, is that it's not clear yet how it > impact the LRU eviction semantics (see the Ed. Note above) > > ... > > Ok, I *think* that's a more accurate description. Do folks agree? > > - James > > >
Received on Tuesday, 2 July 2013 06:21:58 UTC