- From: James M Snell <jasnell@gmail.com>
- Date: Mon, 1 Jul 2013 11:32:30 -0700
- To: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Based on some of the questions I've seen posted so far, there seems to be some confusion still on the header compression mechanism... Let me see if I can help clear up some of the confusion. The sender and receiver each maintain their own synchronized view of the compression context. The compression context consists of three components: 1. A static table of common name+value pairs. 2. A dynamic table of name+value pairs 3. The current "reference set" of name+value pairs When the connection is first established, both the dynamic table (#2) and the reference set (#3) are empty. The sender constructs a logical set of header name+value pairs, where each name can have multiple values. Each distinct name+value combination is treated independently of the others. For each name+value pair in the set, the sender needs to determine whether or not to use an Indexed or Literal representation. The Indexed representation is appropriate if the name+value pair already exists in either the static (#1) or dynamic (#2) tables. The Literal representation is appropriate if the name+value pair either does not exist in #1 or #2, or if the sender does not want the name+value pair stored in the compression context (more on this part later) For example, let's assume that the static table consists only of two name+value pairs: 0 = :method = GET 1 = :method = POST 2 = :path = / Then, let's assume that I've created a new connection and the first HEADERS frame I send to the server contains the header fields: :method = GET :path = / foo1 = bar foo2 = baz When I process this set of header fields, I first check each pair against the static table and find that :method = GET uses the index (0) while :path = / uses the index (2). Since I don't find foo1 or foo2 in the static table, I check the dynamic table. Since the connection is brand new and the dynamic table is empty, I determine that I have to use the Literal representation for those two fields. :method = GET ==> Indexed as (0) :path = / ==> Indexed as (2) foo1 = bar ==> Literal foo2 = bar ==> Literal Now, still on the sending side, once I determine the kind of representation, I need to determine what changes need to be made to the reference set (#3). The reference set is what the sender and receiver both use to determine what set of header fields are currently active. Since this is a brand new connection, the reference set is empty, so with this first set of headers, the changes to the reference set are straightforward... I'm adding everything... To add an Indexed representation of a header, all I do is reference it's index.. that's easy enough to do... To add a Literal representation of a header, there are two options: With Indexing, or Without Indexing. With Indexing means that when the header field is added to the Reference set (#3), the name+value pair is also added to the dynamic table (#2). Without indexing means that when the header field is added to the Reference set (#3) the name+value pair is NOT added to the dynamic table. Which of these to use to entirely up to the sender. So I'm going to decide that foo1 is going to be With Indexing, while foo2 is going to be Without Indexing. (there is another option here between incremental vs. substitution indexing but I'll get to that in a few minutes) At this point, I have a set of instructions of how to modify the reference set: 1. Add Index #0 2. Add Index #2 3. Add Literal "foo1 = bar" with Indexing 4. Add Literal "foo2 = baz" without Indexing At this point, I can create the serialized header block that will be transmitted within the HEADERS frame. This block ONLY contains the serialized set of instructions for modifying the reference set. For instructions 1 and 2 (adding index #0 and index #1), the serialization is very simple and requires exactly two bytes (0x80 0x82). For instruction #3, it's slightly more complicated, but not much... we have a single prefix byte (0x40) followed by the length prefixed header name (0x04 0x67 0x6F 0x6F 0x31) and the length prefixed header value (0x03 0x62 0x61 0x72). For instruction #4, the encoding is only slightly different. The single prefix byte is 0x60 followed by the length prefixed header name (0x04 0x67 0x6F 0x6F 0x32) and the length prefixed value (0x03 0x62 0x61 0x7A). Putting it all together, the complete sequence of octets that I will send within the HEADERS frame is: 0x80 0x82 0x40 0x04 0x67 0x6F 0x6F 0x31 0x03 0x62 0x61 0x72 0x60 0x04 0x67 0x6F 0x6F 0x32 0x03 0x62 0x61 0x7A Once I generate this sequence, a new entry is added to the dynamic table (#2): 3 = foo1 = bar And the reference set (#3) has been updated to include the four headers I sent: :method = GET :path = / foo1 = bar foo2 = baz I send the serialized instructions to the other endpoint. It receives those and reverses the process first by parsing the octet sequence to determine the exact set of instructions it needs to follow to modify it's own view of the reference set. Once it knows what the operations are, it begins making the necessary changes to the reference set. First, it sees two Index representations, so it looks those index values up in the Static (#1) and Dynamic tables (#2). It finds Index #0 and Index #1 in the static table so it adds those name+value pairs to it's reference set (#3). Next, it finds the Literal With Indexing instruction for "foo1 = bar". Since it is "With Indexing", the server adds the name+value pair to the reference set (#3) *AND* to the dynamic table (#2). Next it finds the Literal Without Indexing instructions for "foo2 = baz". Since it is "Without Indexing", the server only adds the name+value pair to the reference set and notes the fact that the header was not indexed at all. At this point, the sender and receiver should have identical views of both the dynamic table (#2) and reference set (#3). So far so good. Let's move on... back on the sending side. Now I want to send another request to the server. Some of the header fields remain the same, some have different values, some are no longer used at all. :method = POST :path = / foo1 = new foo3 = new When serializing these headers, I'm going to follow the same process, but since our dynamic table (#2) and reference set (#3) are both no longer empty, the results are going to be a bit different. Going through each of the headers, I first determine that :method = POST is already contained in the static set using Index #1. I also note that :path = / is also in the static set using Index #2 (this hasn't changed). Looking at the dynamic table (#2), I see that there is a "foo1" header with a different value sitting at Index #3. I make note of that index position so I can use it next. Lastly, I determine that the new header field "foo3 = new" is not in either the static or dynamic set. I'm going to have to use the Literal representation for that one. Once I've determined the basic type of representation, I need to determine how the current reference set needs to be changed. As a reminder, here's what the current reference set looks like: :method = GET :path = / foo1 = bar foo2 = baz Immediately, I see that :method = GET needs to be replaced with :method = POST, since we know the static index positions for both of those, we can simple turn one off by referencing it's index and turn the other on (also by referencing it's index). That requires a simple two-byte sequence (0x80 0x81). I see that :path = / is already in the reference set. So for that one, I don't have to do anything! Awesome. Let's move on. We have an index position for the header field foo1 but not the full name+value pair "foo1=bar". Remember when I briefly mentioned Incremental Indexing vs. Substitution indexing? We need to make a decision. When Incremental Indexing, I can add "foo1 = new" as an entirely new entry in the dynamic table (#2) or I can replace the value of the existing "foo1 = bar" entry sitting at index #3. The choice is entirely up to the sender and can be based on a number of factors... the most important of which is basic memory management. For now, let's keep things simple and use Incremental Indexing. We want to minimize the amount of data we're going to send, however, so instead of sending the header field name as a length prefixed string, we're going to reference the existing index position and send along a new value ( 0x43 0x03 0x6E 0x65 0x77 ). Moving on, we determine that "foo2 = baz" needs to be removed from the reference set. We see, however, that "foo2 = baz" was previously sent as a Literal without any indexing. This means that it will automatically be removed from the reference set the next time around without the sender having to do anything! That's helpful. Now all we need to deal with is the new "foo2 = new" header field, we'll do that using Literal With Indexing, like we did originally with the "foo1 = bar" header field in the first request.. 0x40 0x04 0x67 0x6F 0x6F 0x33 0x03 0x6E 0x65 0x77. The complete sequence of octets for the second request is: 0x80 0x81 0x43 0x03 0x6E 0x65 0x77 0x40 0x04 0x67 0x6F 0x6F 0x33 0x03 0x6E 0x65 0x77 After I construct this sequence, my dynamic table has two entries: foo1 = bar foo1 = new And my reference set looks like: :method = POST :path = / foo1 = new foo3 = new I send the HEADERS frame to the server and it deconstructs the set of instructions, applies those to it's own reference set, updates it's dynamic table, and goes on from there. The process can get a bit complicated at times, but that's the basic operation. There are a few other things to consider: The size of the dynamic table can be bound by the recipient. The sender is not permitted to exceed that size. If adding a new item to the dynamic table will cause the table to grow beyond this limit, existing name+value pairs need to be removed, starting with the name+value pairs that were least-recently written to the table. That sounds straightforward, however, given the distinction between Incremental and Substitution indexing, the Least Recently Written rule can become a bit complicated in the implementation. Using Substitution indexing resets the Write time for that particular index, meaning that eviction does not necessary occur in Index order. For example, let's suppose that I send one HEADERS frame that incrementally adds three items to the dynamic table. Let's call those #1, #2 and #3. Later, I send a new HEADERS that substitutes a new value for Index #2. Last, I send a new HEADERS frame that adds two new headers for #4 and #5. The only problem is, adding #4 and #5 to the dynamic table will cause it to exceed it's size limit, so I need to evict some of the existing entries. Because of the substitution of #2, Items #1 and #3 are the first to be evicted (in that order). So the dynamic table contents become #2, #4, and #5). Eviction needs to be handled carefully with regards to the current reference set. Suppose, for instance, that in one request, I add "foo1 = bar" to the dynamic table and to the current reference set. In the *same request*, I add "foo2 = baz" to the dynamic table and to the current reference set, however, adding this second item causes the dynamic table to exceed it's defined size, forcing the previously written "foo1 = bar" to be evicted... On the receiving end, I'll see the instruction to add "foo1 = bar" first, which will add it to the reference set and the dynamic table, then I'll see the instruction to add "foo2 = baz", which adds it to the reference set and the dynamic table, but evicts "foo1 = bar" from the dynamic table. Despite this eviction, both "foo1 = bar" and "foo2 = baz" are both STILL in the reference set. The next time around, "foo1 = bar" ends up being treated as a non-indexed literal and is dropped automatically from the reference set if it is not explicitly mentioned in the new set of headers. Another consideration that must be carefully weighed is the choice between Incremental and Substitution indexing. If an implementation always chooses to use Incremental Indexing, Eviction will become the only reliable means of memory management. Long lived connections will build up full compression contexts that tie up a potentially significant amount of memory. This could be dangerous, especially since there is no explicit mechanism for deleting items from the dynamic table. Anther challenge with Incremental Indexing is that index values grow monotonically without any upper limit. Index positions are represented in the serialization as variable width integers without any maximum number of bytes. This means that long lived connections could end up with quite large index positions that consume a large number of bytes. Substitution Indexing provides only a partial solution to this problem by making it possible to reuse existing index positions so long as the header field name remain the same. There still exists the possibility of tying up memory, however, if there are a large number of different header names. Through clever implementation, these memory management issues can be addressed and mitigated, but implementers need to keep aware of the issues. Anyway, I hope that gives a helpful overview of the header compression process and answers any of the lingering questions :-) Roberto and Herve: If I missed any points, please speak up! :-)
Received on Monday, 1 July 2013 18:33:18 UTC