Re: SPDY and the HTTP Binding from Roberto Peon on 2012-10-12 (ietf-http-wg@w3.org from October to December 2012)

From: Roberto Peon <grmocg@gmail.com>
Date: Fri, 12 Oct 2012 00:49:47 -0700
To: Willy Tarreau <w@1wt.eu>
Cc: James M Snell <jasnell@gmail.com>, ietf-http-wg@w3.org
Message-ID: <CAP+FsNcp7r2ad-X=2aNYq=_FB8cKgKEpTUYW7eG7YsyFnrwbJg@mail.gmail.com>
As a warning, I do need to explicitly put the license in there for the code.
The license would be the Chromium license.

I've been checking my code into:
  https://github.com/grmocg/SPDY-Specification/tree/gh-pages/example_code

The python code, while outdated, is end-to-end. It does delta-encoding,
huffman coding, serialization, deserialization, decoding and re-inflating.
It is super-duper slow and wasn't written to test CPU performance.

The c++ code, while incomplete, gives me a much better idea of the
performance.
Currently, the not-terribly-well optimized code is happy to do ~200MB/s of
raw header into delta-encoded instructions/sec/core on my workstation and
given my dataset. In terms of HTTP requests/second I think it is on the
order of 30k/core. It will get slower once the serialization goes in (which
includes huffman coding and thus more bit-twiddling). I honestly have no
idea how much slower it may get, but am anxious to find out.

The missing pieces from the c++ code are the serialization,
deserialization, inflating.
The parts which are done are the bitbucket, huffman (including canonical
huffman), and delta-encoder.
It is missing the decoder (which is trivial) and
serialization/deserialization components, which may not be trivial,
depending on how efficient/inefficient the bitshifting operations end up
being.

Compressor unknowns/tradeoffs (a.k.a. things I intend to play with):
  *  Byte or bit alignment in various variations.
  *  Huffman coding of non-text fields (e.g. the opcodes themselves)
  *  Removal of opcodes-- instead, simply indicate the number of each type
or use a token to indicate the end of a section of the expected type. This
could save somewhere between 2-4 bits per operation, at least
pre-huffman-coded.
  *  Using different frequency tables (and thus huffman encodings) for
cookie data (unless this brings significant benefit, I'll prefer to not
have this)
  *  ID renumbering (does a referenced item get renumbered when it is used
so that consecutive items are likely to have been used together?)
  *  little-endian or big-endian, and specifically how (given that most
things today are little-endian...)
  *  implementation of proxy-pass-through-- one of the design goals here
was to allow a proxy to skip doing most of the delta or huffman encoding,
and just have to maintain a mapping between the IDs on the ingress and
egress sides for the different connection contexts. I believe it is doable,
but.. haven't done it. At the very least, since the huffman coding is
currently completely static, much can be taken without having to be
re-translated from the ingress side.
  *  providing a mechanism for the negotiation of a different huffman
encoding to be transferred. Given that a canonical huffman code is used
(ostensibly to increase decode speed), it is also easy/cheap to send one. I
haven't determined how big a savings this could be on a per-page basis.
This is something that could be done with the python code without too much
effort.
  *  modifying the instructions which create new IDs (i.e. clone and kvsto)
to not require mention of the ID which they're creating-- this is
unnecessary so long as a single ID is mentioned with the HEADERS frame--
each new ID will simple increment from there.
  *  and some more esoteric stuff.
  *  ensure that the pre-populated dictionary items never get thrown off
the LRU, since they must be maintained in memory anyway for any
server/client which uses multiple connections... it is likely that this
wouldn't incur any additional memory cost and it could increase compressor
efficiency.


The general scheme:

Delta-encoding of the headers, with the atoms being key-value pairs.
Cookie-crumbs are separated so that only the part of the cookie that
changes must be sent.

If none of the headers changed from one request to another (unlikely-- at
least the path would most of the time), then no textual information is sent.

After the delta-encoding is done, a static-huffman encoding is applied to
the text.

   - have an LRU of key, value entries.
   - the default-size for this state is 20k, or the size of the current
      header, whichever is larger.
      - the server can change this size
      - the first N-values are pre-populated with common keys and common
      values,
         - e,g. "method", "get"  - is a pre-populated entry
      - each time a particular key-value is used/referred-to, it goes to
      the front of the LRU.
      - The oldest key-value pairs (at the back of the LRU) go away if the
      state size is exceeded.
      - The LRU is updated with a small set of operations:
         - clone(key_idx, new_val):
         this adds a new key-value pair to the LRU by copying the key from
         at the key-index 'key_idx" in the LRU, and adding a new value
to it. e.g.:
         LRU: [ 0: ("method", "get") ]
         clone(0, "put") ->
         LRU [ 0:("method", "get"), 1: ("method", "put") ]
         - kvstore(new_key, new_val):
         this adds a new key-value pair to the LRU .. by inserting it at
         the end.
         e.g.:
         LRU [0:("method", "get")]
         kvstore("path", "/index.html") ->
         LRU [0:("method", "get"), 1:("path", "/index.html")]
      - have a constrained number of header-index-sets
      - a header-index-set represents an HTTP header
         - each index in the set corresponds to an entry in the LRU (by
         index)
         - each index in the set thus represents a key-value pair, one of
         the many that may compose the entirety of the headers
      - header-index-sets are modified by the encoder by putting indices
      into, or removing them from the sets.
         - the instruction do this is called 'toggle':, e.g:
           LRU [0:("method", "get"), 1:("path", "/index.html"), 2:("path",
         "/")]
           header-index-set [0: (0, 1) ]
         which represents a request like this:
           get /index.html
         If the following operations are applied:
           toggle(0, 1)  # header-index-group 0, toggle LRU index 1 off
           toggle(0, 2)  # header-index-group 0, toggle LRU index 2 on
         would cause the state to now be:
           LRU [0:("method", "get"), 1:("path", "/index.html"), 2:("path",
         "/")]
           header-index-set [0: (0, 2) ]
         and now the request would be:
           get /
      - keys and values are compressed with static-huffman codes
      - the huffman-table for request and response are different
      - the huffman-codes are canonical to improve decoding speed and to
      potentially allow for either side to publish a new huffman-code in the
      future in a space-efficient manner
      - the frequency-count for the construction of these codes was found
      by running a bunch of headers from different sites through the delta
      encoding (this is key) and computing the frequency of the letters of the
      values that *change* in the headers.
   - keys are refcounted and counted against the compression state only
   once until removed
   - keys have their own ID space so they can be referenced separately
   (e.g. in the clone operation)


There are a number of interesting things about this approach:

   1. As far as anyone can tell so far, it isn't vulnerable to the attacks
   we're seeing right now on gzip+TLS.
   2. You don't need to decode header values in order to interpret them
      1. First of all, because only changes are sent, if you're smart, you
      only have to figure out what to do with any particular header value once
      2. Since the huffman encoding is static, you just compare the encoded
      values-- no need to decode them. This can end up faster as the number of
      bytes to compare is very very likely smaller.
      3. You can choose to not do huffman encoding if you wish. This is
      necessary since, as with any entropy-compression, sometimes the encoded
      value is larger than the original.
   3. It is easy to share encoder and decoder state on either side of a
   proxy, and can be done efficiently.
      1. Any values that don't change need not be re-encoded- they can be
      blitted from ingress to egress
   4. It should be a fair bit cheaper than gzip to encode
   5. The compression achieved thusfar is competitive with gzip
      1. higher gzip window sizes will outperform the new compression
      scheme over current headers
      2. trying to be less optimal with headers results in better
      compression with gzip
      3. There is more compression to be gained over what I've gotten to so
      far..




The code hopefully makes this far more clear for those who take a gander.

-=R

On Thu, Oct 11, 2012 at 11:46 PM, Willy Tarreau <w@1wt.eu> wrote:

> Hi Roberto,
>
> On Thu, Oct 11, 2012 at 02:18:18PM -0700, Roberto Peon wrote:
> > On the compression stuff-- I apologize for the delay in getting the
> > compression spec out. If all people want is a description of the
> > fundamentals, I can do that quickly. For now I've been concentrating on
> the
> > 'working code' informing the specification, and so have been working to
> > ensure that the final product isn't more complex than is really required.
>
> For having lived the same experience a few months ago, I can confirm that
> it is *very* hard to provide a spec or just an idea without working code,
> because you need to see your compressor work in order to validate the
> concept. Amos experienced the same too.
>
> In my opinion, it would be a lot simpler that we all work on the same
> code basis, each with his own fork and feedback with results and ideas.
> I didn't have enough time to help Amos when he took over the job, but
> I'm sure it could have gone further and faster with more help here from
> other implementers.
>
> > On that note, what is it that people want to see here w.r.t. the
> > compression stuff?-- Something high-level quickly, or a functional spec
> > proposal in 1-3 weeks?
>
> If you think you can already express the basic ideas and what was tested
> vs what still needs to be validated, it would be a first step. But as soon
> as you have some code anyone can experiment with without having to rebuild
> a full browser, I think it could progress quickly.
>
> Cheers,
> Willy
>
>
Received on Friday, 12 October 2012 07:50:16 UTC