- From: Roberto Peon <grmocg@gmail.com>
- Date: Mon, 1 Jul 2013 23:26:29 -0700
- To: James M Snell <jasnell@gmail.com>
- Cc: Mike Bishop <Michael.Bishop@microsoft.com>, HTTP Working Group <ietf-http-wg@w3.org>
- Message-ID: <CAP+FsNc45H9C9cayBn7B1pY6NGwXMEAcXp_7pvw08EpMFgdfCQ@mail.gmail.com>
This does not adapt well to key-values which change quickly or are small. -=R On Mon, Jul 1, 2013 at 11:21 PM, James M Snell <jasnell@gmail.com> wrote: > After implementing several variations, I want something even simpler... > A fixed index range, assigned incrementally with rollover at the end and > no substitution. It becomes very simple that way. > On Jul 1, 2013 8:29 PM, "Mike Bishop" <Michael.Bishop@microsoft.com> > wrote: > >> Yes -- we just need to clarify the eviction behavior. I actually prefer >> the evict-from-front model, because it means there's no need for metadata >> to include a timestamp on each entry in the table. You know which item >> you're going to evict, because it's clear. >> >> -----Original Message----- >> From: James M Snell [mailto:jasnell@gmail.com] >> Sent: Monday, July 1, 2013 5:41 PM >> To: Mike Bishop >> Cc: ietf-http-wg@w3.org >> Subject: Re: Header Compression Overview >> >> Ok, so I'm taking another look through the current header compression >> draft and I'd swear that it said some things differently the last time I >> read it ;-) ... (which is, of course, impossible... so I'm going to blame >> it on the 110+ degree heat here today and the fact that I've read so many >> different versions of header compression drafts over the last few months >> that my specific recollection of any individual version is a bit gooey in >> the middle). So.. with that said, allow me to make a few corrections to the >> explanation... Let's see if this is a bit closer to reality: >> >> The sender and receiver each maintain their own synchronized view of the >> compression context. >> >> The compression context consists of two components: >> >> 1. A dynamic table of name+value pairs >> 2. The current "reference set" of name+value pairs >> >> When the connection is first established the reference set (#2) is empty >> and the header table is prepopulated. >> >> The sender constructs a logical set of header name+value pairs, where >> each name can have multiple values. Each distinct name+value combination is >> treated independently of the others. >> >> For each name+value pair in the set, the sender needs to determine >> whether or not to use an Indexed or Literal representation. The Indexed >> representation is appropriate if the name+value pair already exists in >> either the header table (#1). The Literal representation is appropriate if >> the name+value pair either does not exist in #1, or if the sender does not >> want the name+value pair stored in the compression context (more on this >> part later) >> >> For example, let's assume that the header table initially consists of >> three name+value pairs: >> >> 0 = :method = GET >> 1 = :method = POST >> 2 = :path = / >> >> Then, let's assume that I've created a new connection and the first set >> of headers I send to the server contains the header fields: >> >> :method = GET >> :path = / >> foo1 = bar >> foo2 = baz >> >> When I process this set of header fields, I first check each pair against >> the header table and find that :method = GET uses the index >> (0) while :path = / uses the index (2). Since I don't find foo1 or >> foo2 in the static table, I determine that I have to use the Literal >> representation for those two fields. >> >> :method = GET ==> Indexed as (0) >> :path = / ==> Indexed as (2) >> foo1 = bar ==> Literal >> foo2 = bar ==> Literal >> >> Now, still on the sending side, once I determine the kind of >> representation, I need to determine what changes need to be made to the >> reference set (#2). The reference set is what the sender and receiver both >> use to determine what set of header fields are currently active. Since this >> is a brand new connection, the reference set is empty, so with this first >> set of headers, the changes to the reference set are straightforward... I'm >> adding everything... >> >> To add an Indexed representation of a header, all I do is reference it's >> index.. that's easy enough to do... >> >> To add a Literal representation of a header, there are two options: >> With Indexing, or Without Indexing. With Indexing means that when the >> header field is added to the Reference set (#2), the name+value pair is >> also added to the header table (#1). Without indexing means that when the >> header field is added to the Reference set (#2) the >> name+value pair is NOT added to the dynamic table. Which of these to >> use to entirely up to the sender. So I'm going to decide that foo1 is >> going to be With Indexing, while foo2 is going to be Without Indexing. >> >> (there is another option here between incremental vs. substitution >> indexing but I'll get to that in a few minutes) >> >> At this point, I have a set of instructions of how to modify the >> reference set: >> >> 1. Add Index #0 >> 2. Add Index #2 >> 3. Add Literal "foo1 = bar" with Indexing >> 4. Add Literal "foo2 = baz" without Indexing >> >> At this point, I can create the serialized header block that will be >> transmitted within the HEADERS frame. This block ONLY contains the >> serialized set of instructions for modifying the reference set. >> >> For instructions 1 and 2 (adding index #0 and index #2), the >> serialization is very simple and requires exactly two bytes (0x81 0x83). (I >> have to bump up the index values by 1 when serializing for some reason) >> >> For instruction #3, it's slightly more complicated, but not much... we >> have a single prefix byte (0x40) followed by the length prefixed header >> name (0x04 0x67 0x6F 0x6F 0x31) and the length prefixed header value (0x03 >> 0x62 0x61 0x72). >> >> For instruction #4, the encoding is only slightly different. The single >> prefix byte is 0x60 followed by the length prefixed header name >> (0x04 0x67 0x6F 0x6F 0x32) and the length prefixed value (0x03 0x62 >> 0x61 0x7A). >> >> Putting it all together, the complete sequence of octets that I will send >> within the HEADERS frame is: >> >> 0x81 0x83 0x40 0x04 >> 0x67 0x6F 0x6F 0x31 >> 0x03 0x62 0x61 0x72 >> 0x60 0x04 0x67 0x6F >> 0x6F 0x32 0x03 0x62 >> 0x61 0x7A >> >> Once I generate this sequence, a new entry is added to the header table >> (#1): >> >> 3 = foo1 = bar >> >> And the reference set (#2) has been updated to include the four headers I >> sent: >> >> :method = GET >> :path = / >> foo1 = bar >> foo2 = baz >> >> (this is a simplification actually but it's close enough for now) >> >> I send the serialized instructions to the other endpoint. It receives >> those and reverses the process first by parsing the octet sequence to >> determine the exact set of instructions it needs to follow to modify it's >> own view of the reference set. Once it knows what the operations are, it >> begins making the necessary changes to the reference set. >> First, it sees two Index representations, so it looks those index values >> up in the header table (#1). It finds Index #0 and Index #2 in the static >> table so it adds those name+value pairs to it's reference set. Next, it >> finds the Literal With Indexing instruction for "foo1 = bar". Since it is >> "With Indexing", the server adds the name+value pair to the reference set >> *AND* to the header table. Next it finds the Literal Without Indexing >> instructions for "foo2 = baz". Since it is "Without Indexing", the server >> only adds the name+value pair to the reference set and notes the fact that >> the header was not indexed at all. >> >> At this point, the sender and receiver should have identical views of >> both the header table and reference set. >> >> So far so good. >> >> Let's move on... back on the sending side. Now I want to send another >> request to the server. Some of the header fields remain the same, some have >> different values, some are no longer used at all. >> >> :method = POST >> :path = / >> foo1 = new >> foo3 = new >> >> When serializing these headers, I'm going to follow the same process, but >> since our header table and reference set are both different, the serialized >> results are going to be a bit different. >> >> Going through each of the headers, I first determine that :method = POST >> is already contained in the header table at Index #1. I also note that >> :path = / is also in the header table using Index #2 (this hasn't changed). >> >> Also looking at the header table, I see that there is a "foo1" header >> with a different value sitting at Index #3. I make note of that index >> position so I can use it next. >> >> Lastly, I determine that the new header field "foo3 = new" is not in >> header table at all. I'm going to have to use the Literal representation >> for that one. >> >> Once I've determined the basic type of representation, I need to >> determine how the current reference set needs to be changed. As a reminder, >> here's what the current reference set looks like: >> >> :method = GET >> :path = / >> foo1 = bar >> foo2 = baz >> >> Immediately, I see that :method = GET needs to be replaced with :method = >> POST, since we know the static index positions for both of those, we can >> simple turn one off by referencing it's index and turn the other on (also >> by referencing it's index). That requires a simple two-byte sequence (0x81 >> 0x82). >> >> I see that :path = / is already in the reference set. So for that one, I >> don't have to do anything! Awesome. Let's move on. >> >> We have an index position for the header field foo1 but not the full >> name+value pair "foo1=bar". Remember when I briefly mentioned >> Incremental Indexing vs. Substitution indexing? We need to make a >> decision. When Incremental Indexing, I can add "foo1 = new" as an entirely >> new entry in the dynamic table (#2) or I can replace the value of the >> existing "foo1 = bar" entry sitting at index #3. The choice is entirely up >> to the sender and can be based on a number of factors... the most important >> of which is basic memory management. For now, let's keep things simple and >> use Incremental Indexing. We want to minimize the amount of data we're >> going to send, however, so instead of sending the header field name as a >> length prefixed string, we're going to reference the existing index >> position and send along a new value ( 0x44 0x03 0x6E 0x65 0x77 ). (again, I >> have to bump up the value of index value by 1) >> >> Moving on, we determine that "foo2 = baz" needs to be removed from the >> reference set. We see, however, that "foo2 = baz" was previously sent as a >> Literal without any indexing. This means that it will automatically be >> removed from the reference set the next time around without the sender >> having to do anything! That's helpful. >> >> Now all we need to deal with is the new "foo2 = new" header field, we'll >> do that using Literal With Indexing, like we did originally with the "foo1 >> = bar" header field in the first request.. 0x40 0x04 0x67 0x6F 0x6F 0x33 >> 0x03 0x6E 0x65 0x77. >> >> The complete sequence of octets for the second request is: >> >> 0x81 0x82 0x44 0x03 >> 0x6E 0x65 0x77 0x40 >> 0x04 0x67 0x6F 0x6F >> 0x33 0x03 0x6E 0x65 >> 0x77 >> >> After I construct this sequence, my header table has five entries: >> >> 0 = :method = GET >> 1 = :method = POST >> 2 = :path = / >> 3 = foo1 = bar >> 4 = foo1 = new >> >> And my reference set looks like: >> >> :method = POST >> :path = / >> foo1 = new >> foo3 = new >> >> I send the HEADERS frame to the server and it deconstructs the set of >> instructions, applies those to it's own reference set, updates it's header >> table, and goes on from there. >> >> The process can get a bit complicated at times, but that's the basic >> operation. There are a few other things to consider: >> >> The size of the dynamic table can be bound by the recipient. The sender >> is not permitted to exceed that size. If adding a new item to the dynamic >> table will cause the table to grow beyond this limit, existing name+value >> pairs need to be removed, starting with the >> name+value pairs that were least-recently written to the table. >> >> (((Ed. Note: We need clarity on this next part: ))) >> >> That sounds straightforward, however, given the distinction between >> Incremental and Substitution indexing, the Least Recently Written rule can >> become a bit complicated in the implementation. Using Substitution indexing >> resets the Write time for that particular index, meaning that eviction does >> not necessary occur in Index order. >> >> For example, let's suppose that I send one HEADERS frame that >> incrementally adds three items to the dynamic table. Let's call those #1, >> #2 and #3. Later, I send a new HEADERS that substitutes a new value for >> Index #2. Last, I send a new HEADERS frame that adds two new headers for #4 >> and #5. The only problem is, adding #4 and #5 to the dynamic table will >> cause it to exceed it's size limit, so I need to evict some of the existing >> entries. Because of the substitution of #2, Items #1 and #3 are the first >> to be evicted (in that order). So the dynamic table contents become #2, #4, >> and #5). >> >> (((Ed. Note: Or is it.... that Substitution DOES NOT change the >> Least-Recently-Written eviction order? In which case, a more recently >> substituted value can be dropped before a less-least-recently written >> Incremental value? That doesn't feel right to me...))) >> >> Eviction needs to be handled carefully with regards to the current >> reference set. Suppose, for instance, that in one request, I add "foo1 = >> bar" to the header table and to the current reference set. In the *same >> request*, I add "foo2 = baz" to the header table and to the current >> reference set, however, adding this second item causes the header table to >> exceed it's defined size, forcing the previously written "foo1 = bar" to be >> evicted... >> >> On the receiving end, I'll see the instruction to add "foo1 = bar" >> first, which will add it to the reference set and the header table, then >> I'll see the instruction to add "foo2 = baz", which adds it to the >> reference set and the header table, but evicts "foo1 = bar" from the >> dynamic table. Despite this eviction, both "foo1 = bar" and "foo2 = baz" >> are both STILL in the reference set. >> >> The next time around, "foo1 = bar" ends up being treated as a non-indexed >> literal and is dropped automatically from the reference set if it is not >> explicitly mentioned in the new set of headers. >> >> Another consideration that must be carefully weighed is the choice >> between Incremental and Substitution indexing. If an implementation always >> chooses to use Incremental Indexing, Eviction will become the only reliable >> means of memory management. Long lived connections will build up full >> compression contexts that tie up a potentially significant amount of >> memory. This could be dangerous, especially since there is no explicit >> mechanism for deleting items from the header table. >> >> Substitution Indexing provides only a partial solution to this problem by >> making it possible to reuse existing index positions. The problem with >> using Substitution indexing, however, is that it's not clear yet how it >> impact the LRU eviction semantics (see the Ed. Note above) >> >> ... >> >> Ok, I *think* that's a more accurate description. Do folks agree? >> >> - James >> >> >>
Received on Tuesday, 2 July 2013 06:27:08 UTC