- From: Roberto Peon <grmocg@gmail.com>
- Date: Fri, 12 Oct 2012 00:49:47 -0700
- To: Willy Tarreau <w@1wt.eu>
- Cc: James M Snell <jasnell@gmail.com>, ietf-http-wg@w3.org
- Message-ID: <CAP+FsNcp7r2ad-X=2aNYq=_FB8cKgKEpTUYW7eG7YsyFnrwbJg@mail.gmail.com>
As a warning, I do need to explicitly put the license in there for the code. The license would be the Chromium license. I've been checking my code into: https://github.com/grmocg/SPDY-Specification/tree/gh-pages/example_code The python code, while outdated, is end-to-end. It does delta-encoding, huffman coding, serialization, deserialization, decoding and re-inflating. It is super-duper slow and wasn't written to test CPU performance. The c++ code, while incomplete, gives me a much better idea of the performance. Currently, the not-terribly-well optimized code is happy to do ~200MB/s of raw header into delta-encoded instructions/sec/core on my workstation and given my dataset. In terms of HTTP requests/second I think it is on the order of 30k/core. It will get slower once the serialization goes in (which includes huffman coding and thus more bit-twiddling). I honestly have no idea how much slower it may get, but am anxious to find out. The missing pieces from the c++ code are the serialization, deserialization, inflating. The parts which are done are the bitbucket, huffman (including canonical huffman), and delta-encoder. It is missing the decoder (which is trivial) and serialization/deserialization components, which may not be trivial, depending on how efficient/inefficient the bitshifting operations end up being. Compressor unknowns/tradeoffs (a.k.a. things I intend to play with): * Byte or bit alignment in various variations. * Huffman coding of non-text fields (e.g. the opcodes themselves) * Removal of opcodes-- instead, simply indicate the number of each type or use a token to indicate the end of a section of the expected type. This could save somewhere between 2-4 bits per operation, at least pre-huffman-coded. * Using different frequency tables (and thus huffman encodings) for cookie data (unless this brings significant benefit, I'll prefer to not have this) * ID renumbering (does a referenced item get renumbered when it is used so that consecutive items are likely to have been used together?) * little-endian or big-endian, and specifically how (given that most things today are little-endian...) * implementation of proxy-pass-through-- one of the design goals here was to allow a proxy to skip doing most of the delta or huffman encoding, and just have to maintain a mapping between the IDs on the ingress and egress sides for the different connection contexts. I believe it is doable, but.. haven't done it. At the very least, since the huffman coding is currently completely static, much can be taken without having to be re-translated from the ingress side. * providing a mechanism for the negotiation of a different huffman encoding to be transferred. Given that a canonical huffman code is used (ostensibly to increase decode speed), it is also easy/cheap to send one. I haven't determined how big a savings this could be on a per-page basis. This is something that could be done with the python code without too much effort. * modifying the instructions which create new IDs (i.e. clone and kvsto) to not require mention of the ID which they're creating-- this is unnecessary so long as a single ID is mentioned with the HEADERS frame-- each new ID will simple increment from there. * and some more esoteric stuff. * ensure that the pre-populated dictionary items never get thrown off the LRU, since they must be maintained in memory anyway for any server/client which uses multiple connections... it is likely that this wouldn't incur any additional memory cost and it could increase compressor efficiency. The general scheme: Delta-encoding of the headers, with the atoms being key-value pairs. Cookie-crumbs are separated so that only the part of the cookie that changes must be sent. If none of the headers changed from one request to another (unlikely-- at least the path would most of the time), then no textual information is sent. After the delta-encoding is done, a static-huffman encoding is applied to the text. - have an LRU of key, value entries. - the default-size for this state is 20k, or the size of the current header, whichever is larger. - the server can change this size - the first N-values are pre-populated with common keys and common values, - e,g. "method", "get" - is a pre-populated entry - each time a particular key-value is used/referred-to, it goes to the front of the LRU. - The oldest key-value pairs (at the back of the LRU) go away if the state size is exceeded. - The LRU is updated with a small set of operations: - clone(key_idx, new_val): this adds a new key-value pair to the LRU by copying the key from at the key-index 'key_idx" in the LRU, and adding a new value to it. e.g.: LRU: [ 0: ("method", "get") ] clone(0, "put") -> LRU [ 0:("method", "get"), 1: ("method", "put") ] - kvstore(new_key, new_val): this adds a new key-value pair to the LRU .. by inserting it at the end. e.g.: LRU [0:("method", "get")] kvstore("path", "/index.html") -> LRU [0:("method", "get"), 1:("path", "/index.html")] - have a constrained number of header-index-sets - a header-index-set represents an HTTP header - each index in the set corresponds to an entry in the LRU (by index) - each index in the set thus represents a key-value pair, one of the many that may compose the entirety of the headers - header-index-sets are modified by the encoder by putting indices into, or removing them from the sets. - the instruction do this is called 'toggle':, e.g: LRU [0:("method", "get"), 1:("path", "/index.html"), 2:("path", "/")] header-index-set [0: (0, 1) ] which represents a request like this: get /index.html If the following operations are applied: toggle(0, 1) # header-index-group 0, toggle LRU index 1 off toggle(0, 2) # header-index-group 0, toggle LRU index 2 on would cause the state to now be: LRU [0:("method", "get"), 1:("path", "/index.html"), 2:("path", "/")] header-index-set [0: (0, 2) ] and now the request would be: get / - keys and values are compressed with static-huffman codes - the huffman-table for request and response are different - the huffman-codes are canonical to improve decoding speed and to potentially allow for either side to publish a new huffman-code in the future in a space-efficient manner - the frequency-count for the construction of these codes was found by running a bunch of headers from different sites through the delta encoding (this is key) and computing the frequency of the letters of the values that *change* in the headers. - keys are refcounted and counted against the compression state only once until removed - keys have their own ID space so they can be referenced separately (e.g. in the clone operation) There are a number of interesting things about this approach: 1. As far as anyone can tell so far, it isn't vulnerable to the attacks we're seeing right now on gzip+TLS. 2. You don't need to decode header values in order to interpret them 1. First of all, because only changes are sent, if you're smart, you only have to figure out what to do with any particular header value once 2. Since the huffman encoding is static, you just compare the encoded values-- no need to decode them. This can end up faster as the number of bytes to compare is very very likely smaller. 3. You can choose to not do huffman encoding if you wish. This is necessary since, as with any entropy-compression, sometimes the encoded value is larger than the original. 3. It is easy to share encoder and decoder state on either side of a proxy, and can be done efficiently. 1. Any values that don't change need not be re-encoded- they can be blitted from ingress to egress 4. It should be a fair bit cheaper than gzip to encode 5. The compression achieved thusfar is competitive with gzip 1. higher gzip window sizes will outperform the new compression scheme over current headers 2. trying to be less optimal with headers results in better compression with gzip 3. There is more compression to be gained over what I've gotten to so far.. The code hopefully makes this far more clear for those who take a gander. -=R On Thu, Oct 11, 2012 at 11:46 PM, Willy Tarreau <w@1wt.eu> wrote: > Hi Roberto, > > On Thu, Oct 11, 2012 at 02:18:18PM -0700, Roberto Peon wrote: > > On the compression stuff-- I apologize for the delay in getting the > > compression spec out. If all people want is a description of the > > fundamentals, I can do that quickly. For now I've been concentrating on > the > > 'working code' informing the specification, and so have been working to > > ensure that the final product isn't more complex than is really required. > > For having lived the same experience a few months ago, I can confirm that > it is *very* hard to provide a spec or just an idea without working code, > because you need to see your compressor work in order to validate the > concept. Amos experienced the same too. > > In my opinion, it would be a lot simpler that we all work on the same > code basis, each with his own fork and feedback with results and ideas. > I didn't have enough time to help Amos when he took over the job, but > I'm sure it could have gone further and faster with more help here from > other implementers. > > > On that note, what is it that people want to see here w.r.t. the > > compression stuff?-- Something high-level quickly, or a functional spec > > proposal in 1-3 weeks? > > If you think you can already express the basic ideas and what was tested > vs what still needs to be validated, it would be a first step. But as soon > as you have some code anyone can experiment with without having to rebuild > a full browser, I think it could progress quickly. > > Cheers, > Willy > >
Received on Friday, 12 October 2012 07:50:16 UTC