- From: James M Snell <jasnell@gmail.com>
- Date: Mon, 1 Jul 2013 23:46:02 -0700
- To: Roberto Peon <grmocg@gmail.com>
- Cc: Mike Bishop <Michael.Bishop@microsoft.com>, ietf-http-wg@w3.org
- Message-ID: <CABP7RbdkPX7_Kp5pwH5vcf6k_KHMbvah79_A705DgH5F8g87dw@mail.gmail.com>
Understood, which is why I agreed the currently spec'd approach is a good starting point for the first implementation draft. :) I'd honestly be happier not having stateful header compression at all. But ah well... As long as whatever we have is clearly explained and extremely well tested, I can live with it. On Jul 1, 2013 11:26 PM, "Roberto Peon" <grmocg@gmail.com> wrote: > This does not adapt well to key-values which change quickly or are small. > -=R > > > On Mon, Jul 1, 2013 at 11:21 PM, James M Snell <jasnell@gmail.com> wrote: > >> After implementing several variations, I want something even simpler... >> A fixed index range, assigned incrementally with rollover at the end and >> no substitution. It becomes very simple that way. >> On Jul 1, 2013 8:29 PM, "Mike Bishop" <Michael.Bishop@microsoft.com> >> wrote: >> >>> Yes -- we just need to clarify the eviction behavior. I actually prefer >>> the evict-from-front model, because it means there's no need for metadata >>> to include a timestamp on each entry in the table. You know which item >>> you're going to evict, because it's clear. >>> >>> -----Original Message----- >>> From: James M Snell [mailto:jasnell@gmail.com] >>> Sent: Monday, July 1, 2013 5:41 PM >>> To: Mike Bishop >>> Cc: ietf-http-wg@w3.org >>> Subject: Re: Header Compression Overview >>> >>> Ok, so I'm taking another look through the current header compression >>> draft and I'd swear that it said some things differently the last time I >>> read it ;-) ... (which is, of course, impossible... so I'm going to blame >>> it on the 110+ degree heat here today and the fact that I've read so many >>> different versions of header compression drafts over the last few months >>> that my specific recollection of any individual version is a bit gooey in >>> the middle). So.. with that said, allow me to make a few corrections to the >>> explanation... Let's see if this is a bit closer to reality: >>> >>> The sender and receiver each maintain their own synchronized view of the >>> compression context. >>> >>> The compression context consists of two components: >>> >>> 1. A dynamic table of name+value pairs >>> 2. The current "reference set" of name+value pairs >>> >>> When the connection is first established the reference set (#2) is empty >>> and the header table is prepopulated. >>> >>> The sender constructs a logical set of header name+value pairs, where >>> each name can have multiple values. Each distinct name+value combination is >>> treated independently of the others. >>> >>> For each name+value pair in the set, the sender needs to determine >>> whether or not to use an Indexed or Literal representation. The Indexed >>> representation is appropriate if the name+value pair already exists in >>> either the header table (#1). The Literal representation is appropriate if >>> the name+value pair either does not exist in #1, or if the sender does not >>> want the name+value pair stored in the compression context (more on this >>> part later) >>> >>> For example, let's assume that the header table initially consists of >>> three name+value pairs: >>> >>> 0 = :method = GET >>> 1 = :method = POST >>> 2 = :path = / >>> >>> Then, let's assume that I've created a new connection and the first set >>> of headers I send to the server contains the header fields: >>> >>> :method = GET >>> :path = / >>> foo1 = bar >>> foo2 = baz >>> >>> When I process this set of header fields, I first check each pair >>> against the header table and find that :method = GET uses the index >>> (0) while :path = / uses the index (2). Since I don't find foo1 or >>> foo2 in the static table, I determine that I have to use the Literal >>> representation for those two fields. >>> >>> :method = GET ==> Indexed as (0) >>> :path = / ==> Indexed as (2) >>> foo1 = bar ==> Literal >>> foo2 = bar ==> Literal >>> >>> Now, still on the sending side, once I determine the kind of >>> representation, I need to determine what changes need to be made to the >>> reference set (#2). The reference set is what the sender and receiver both >>> use to determine what set of header fields are currently active. Since this >>> is a brand new connection, the reference set is empty, so with this first >>> set of headers, the changes to the reference set are straightforward... I'm >>> adding everything... >>> >>> To add an Indexed representation of a header, all I do is reference it's >>> index.. that's easy enough to do... >>> >>> To add a Literal representation of a header, there are two options: >>> With Indexing, or Without Indexing. With Indexing means that when the >>> header field is added to the Reference set (#2), the name+value pair is >>> also added to the header table (#1). Without indexing means that when the >>> header field is added to the Reference set (#2) the >>> name+value pair is NOT added to the dynamic table. Which of these to >>> use to entirely up to the sender. So I'm going to decide that foo1 is >>> going to be With Indexing, while foo2 is going to be Without Indexing. >>> >>> (there is another option here between incremental vs. substitution >>> indexing but I'll get to that in a few minutes) >>> >>> At this point, I have a set of instructions of how to modify the >>> reference set: >>> >>> 1. Add Index #0 >>> 2. Add Index #2 >>> 3. Add Literal "foo1 = bar" with Indexing >>> 4. Add Literal "foo2 = baz" without Indexing >>> >>> At this point, I can create the serialized header block that will be >>> transmitted within the HEADERS frame. This block ONLY contains the >>> serialized set of instructions for modifying the reference set. >>> >>> For instructions 1 and 2 (adding index #0 and index #2), the >>> serialization is very simple and requires exactly two bytes (0x81 0x83). (I >>> have to bump up the index values by 1 when serializing for some reason) >>> >>> For instruction #3, it's slightly more complicated, but not much... we >>> have a single prefix byte (0x40) followed by the length prefixed header >>> name (0x04 0x67 0x6F 0x6F 0x31) and the length prefixed header value (0x03 >>> 0x62 0x61 0x72). >>> >>> For instruction #4, the encoding is only slightly different. The single >>> prefix byte is 0x60 followed by the length prefixed header name >>> (0x04 0x67 0x6F 0x6F 0x32) and the length prefixed value (0x03 0x62 >>> 0x61 0x7A). >>> >>> Putting it all together, the complete sequence of octets that I will >>> send within the HEADERS frame is: >>> >>> 0x81 0x83 0x40 0x04 >>> 0x67 0x6F 0x6F 0x31 >>> 0x03 0x62 0x61 0x72 >>> 0x60 0x04 0x67 0x6F >>> 0x6F 0x32 0x03 0x62 >>> 0x61 0x7A >>> >>> Once I generate this sequence, a new entry is added to the header table >>> (#1): >>> >>> 3 = foo1 = bar >>> >>> And the reference set (#2) has been updated to include the four headers >>> I sent: >>> >>> :method = GET >>> :path = / >>> foo1 = bar >>> foo2 = baz >>> >>> (this is a simplification actually but it's close enough for now) >>> >>> I send the serialized instructions to the other endpoint. It receives >>> those and reverses the process first by parsing the octet sequence to >>> determine the exact set of instructions it needs to follow to modify it's >>> own view of the reference set. Once it knows what the operations are, it >>> begins making the necessary changes to the reference set. >>> First, it sees two Index representations, so it looks those index values >>> up in the header table (#1). It finds Index #0 and Index #2 in the static >>> table so it adds those name+value pairs to it's reference set. Next, it >>> finds the Literal With Indexing instruction for "foo1 = bar". Since it is >>> "With Indexing", the server adds the name+value pair to the reference set >>> *AND* to the header table. Next it finds the Literal Without Indexing >>> instructions for "foo2 = baz". Since it is "Without Indexing", the server >>> only adds the name+value pair to the reference set and notes the fact that >>> the header was not indexed at all. >>> >>> At this point, the sender and receiver should have identical views of >>> both the header table and reference set. >>> >>> So far so good. >>> >>> Let's move on... back on the sending side. Now I want to send another >>> request to the server. Some of the header fields remain the same, some have >>> different values, some are no longer used at all. >>> >>> :method = POST >>> :path = / >>> foo1 = new >>> foo3 = new >>> >>> When serializing these headers, I'm going to follow the same process, >>> but since our header table and reference set are both different, the >>> serialized results are going to be a bit different. >>> >>> Going through each of the headers, I first determine that :method = POST >>> is already contained in the header table at Index #1. I also note that >>> :path = / is also in the header table using Index #2 (this hasn't changed). >>> >>> Also looking at the header table, I see that there is a "foo1" header >>> with a different value sitting at Index #3. I make note of that index >>> position so I can use it next. >>> >>> Lastly, I determine that the new header field "foo3 = new" is not in >>> header table at all. I'm going to have to use the Literal representation >>> for that one. >>> >>> Once I've determined the basic type of representation, I need to >>> determine how the current reference set needs to be changed. As a reminder, >>> here's what the current reference set looks like: >>> >>> :method = GET >>> :path = / >>> foo1 = bar >>> foo2 = baz >>> >>> Immediately, I see that :method = GET needs to be replaced with :method >>> = POST, since we know the static index positions for both of those, we can >>> simple turn one off by referencing it's index and turn the other on (also >>> by referencing it's index). That requires a simple two-byte sequence (0x81 >>> 0x82). >>> >>> I see that :path = / is already in the reference set. So for that one, I >>> don't have to do anything! Awesome. Let's move on. >>> >>> We have an index position for the header field foo1 but not the full >>> name+value pair "foo1=bar". Remember when I briefly mentioned >>> Incremental Indexing vs. Substitution indexing? We need to make a >>> decision. When Incremental Indexing, I can add "foo1 = new" as an entirely >>> new entry in the dynamic table (#2) or I can replace the value of the >>> existing "foo1 = bar" entry sitting at index #3. The choice is entirely up >>> to the sender and can be based on a number of factors... the most important >>> of which is basic memory management. For now, let's keep things simple and >>> use Incremental Indexing. We want to minimize the amount of data we're >>> going to send, however, so instead of sending the header field name as a >>> length prefixed string, we're going to reference the existing index >>> position and send along a new value ( 0x44 0x03 0x6E 0x65 0x77 ). (again, I >>> have to bump up the value of index value by 1) >>> >>> Moving on, we determine that "foo2 = baz" needs to be removed from the >>> reference set. We see, however, that "foo2 = baz" was previously sent as a >>> Literal without any indexing. This means that it will automatically be >>> removed from the reference set the next time around without the sender >>> having to do anything! That's helpful. >>> >>> Now all we need to deal with is the new "foo2 = new" header field, we'll >>> do that using Literal With Indexing, like we did originally with the "foo1 >>> = bar" header field in the first request.. 0x40 0x04 0x67 0x6F 0x6F 0x33 >>> 0x03 0x6E 0x65 0x77. >>> >>> The complete sequence of octets for the second request is: >>> >>> 0x81 0x82 0x44 0x03 >>> 0x6E 0x65 0x77 0x40 >>> 0x04 0x67 0x6F 0x6F >>> 0x33 0x03 0x6E 0x65 >>> 0x77 >>> >>> After I construct this sequence, my header table has five entries: >>> >>> 0 = :method = GET >>> 1 = :method = POST >>> 2 = :path = / >>> 3 = foo1 = bar >>> 4 = foo1 = new >>> >>> And my reference set looks like: >>> >>> :method = POST >>> :path = / >>> foo1 = new >>> foo3 = new >>> >>> I send the HEADERS frame to the server and it deconstructs the set of >>> instructions, applies those to it's own reference set, updates it's header >>> table, and goes on from there. >>> >>> The process can get a bit complicated at times, but that's the basic >>> operation. There are a few other things to consider: >>> >>> The size of the dynamic table can be bound by the recipient. The sender >>> is not permitted to exceed that size. If adding a new item to the dynamic >>> table will cause the table to grow beyond this limit, existing name+value >>> pairs need to be removed, starting with the >>> name+value pairs that were least-recently written to the table. >>> >>> (((Ed. Note: We need clarity on this next part: ))) >>> >>> That sounds straightforward, however, given the distinction between >>> Incremental and Substitution indexing, the Least Recently Written rule can >>> become a bit complicated in the implementation. Using Substitution indexing >>> resets the Write time for that particular index, meaning that eviction does >>> not necessary occur in Index order. >>> >>> For example, let's suppose that I send one HEADERS frame that >>> incrementally adds three items to the dynamic table. Let's call those #1, >>> #2 and #3. Later, I send a new HEADERS that substitutes a new value for >>> Index #2. Last, I send a new HEADERS frame that adds two new headers for #4 >>> and #5. The only problem is, adding #4 and #5 to the dynamic table will >>> cause it to exceed it's size limit, so I need to evict some of the existing >>> entries. Because of the substitution of #2, Items #1 and #3 are the first >>> to be evicted (in that order). So the dynamic table contents become #2, #4, >>> and #5). >>> >>> (((Ed. Note: Or is it.... that Substitution DOES NOT change the >>> Least-Recently-Written eviction order? In which case, a more recently >>> substituted value can be dropped before a less-least-recently written >>> Incremental value? That doesn't feel right to me...))) >>> >>> Eviction needs to be handled carefully with regards to the current >>> reference set. Suppose, for instance, that in one request, I add "foo1 = >>> bar" to the header table and to the current reference set. In the *same >>> request*, I add "foo2 = baz" to the header table and to the current >>> reference set, however, adding this second item causes the header table to >>> exceed it's defined size, forcing the previously written "foo1 = bar" to be >>> evicted... >>> >>> On the receiving end, I'll see the instruction to add "foo1 = bar" >>> first, which will add it to the reference set and the header table, then >>> I'll see the instruction to add "foo2 = baz", which adds it to the >>> reference set and the header table, but evicts "foo1 = bar" from the >>> dynamic table. Despite this eviction, both "foo1 = bar" and "foo2 = baz" >>> are both STILL in the reference set. >>> >>> The next time around, "foo1 = bar" ends up being treated as a >>> non-indexed literal and is dropped automatically from the reference set if >>> it is not explicitly mentioned in the new set of headers. >>> >>> Another consideration that must be carefully weighed is the choice >>> between Incremental and Substitution indexing. If an implementation always >>> chooses to use Incremental Indexing, Eviction will become the only reliable >>> means of memory management. Long lived connections will build up full >>> compression contexts that tie up a potentially significant amount of >>> memory. This could be dangerous, especially since there is no explicit >>> mechanism for deleting items from the header table. >>> >>> Substitution Indexing provides only a partial solution to this problem >>> by making it possible to reuse existing index positions. The problem with >>> using Substitution indexing, however, is that it's not clear yet how it >>> impact the LRU eviction semantics (see the Ed. Note above) >>> >>> ... >>> >>> Ok, I *think* that's a more accurate description. Do folks agree? >>> >>> - James >>> >>> >>> >
Received on Tuesday, 2 July 2013 06:46:33 UTC