Re: delta encoding and state management from Roberto Peon on 2013-01-17 (ietf-http-wg@w3.org from January to March 2013)

From: Roberto Peon <grmocg@gmail.com>
Date: Wed, 16 Jan 2013 17:39:27 -0800
To: James M Snell <jasnell@gmail.com>
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CAP+FsNcF0n3cpPho0+WPM1-grRSEy92EMnJaGYA4j0WyvUm8Ng@mail.gmail.com>
On Wed, Jan 16, 2013 at 5:23 PM, James M Snell <jasnell@gmail.com> wrote:

> Just continuing the investigation of delta... poking holes :-)
>
> There is a potential issue with delta as currently designed. Assume we
>  have a client A, a proxy B and a server C.
>
> Let's assume the following default header dictionary:
>
>   ID | Name | Value
>  ----+------+-------
>    1 | Foo  | A
>    2 | Bar  |
>
> A wants to send a message to C, the headers are:
>
>   Foo: A
>   Bar: B,C
>
> The initial delta encoding would be:
>
>   toggle(1)
>   clone(2,B) => 3
>   clone(2,C) => 4
>
> B receives this message from A and forwards it on to C. C initializes it's
> local copy of the compression context, storing the appropriate values. This
> table has to be maintained from request-to-request, keeping header values
> stored. The table on C becomes:
>
>   ID | Name | Value
>  ----+------+-------
>    1 | Foo  | A
>    2 | Bar  |
>    3 | Bar  | B
>    4 | Bar  | C
>
> The active headers for the current request are pulled from the table
> [1,3,4]
>
> A then wants to send another message to C, the headers are:
>
>   Foo: A
>   Bar: B,D
>
> The delta encoding would be:
>
>   toggle(4)
>   clone(2,D) =>
>
> B forwards this message on to C, which updates the context. The table on C
> becomes:
>
>   ID | Name | Value
>  ----+------+-------
>    1 | Foo  | A
>    2 | Bar  |
>    3 | Bar  | B
>    4 | Bar  | C
>    5 | Bar  | D
>
> The active headers for the second request are pulled from the table
> [1,3,5]. The table grows indefinitely as new header values are used within
> the course of a single session. The complete value of the header would need
> to be stored, highly variable values (cookies, timestamps, etc) would be
> stored in full just in case they are reused later. (Roberto, please correct
> me if I'm wrong on this point!)
>
>
Correct, if you're doing compression, you'll need the compression state. A
proxy would need to keep this around at a minimum so it'd know what hosts,
methods, etc. were being used so as to appropriately route a query (else
you could just read from one TCP hose and write into the other and who
cares about the compression when you can do that).


> So far, so good. Let's mess it up.
>
> Let's assume that the connection between B and C dies before that second
> request is sent to B, causing the compression context shared between B and
> C to be reset and lost. Suddenly the delta encoding in the second message
> becomes invalid, even though nothing has interrupted the connection between
> A and B.
>
> B and C need to have some way of reestablishing the compression context
> after their connection is reestablished or the second message just becomes
> nonsense.
>
> At first review, there appear to be a few options:
>
> 1. B has to also maintain it's own copy of the complete header value
> table, which could consume quite a bit of storage space on the proxy
> (relative to current requirements). Once the connection is re-established,
> B would then translate the delta-encoding in the second message to
> initialize the reconstructed context on C. We currently do not have any
> metrics on just how large this table could potentially become within the
> course of a single session.
>
>
Well, current requirements are either no compression (HTTP/1), or use a
encoder-defined amount of memory at the decoder (gzip in SPDY).
The idea here with delta is to give the decoder more keys to the castle--
the decoder (not the encoder!) dictates how much state it is willing to
maintain and the encoder must stick within that. It wouldn't be too hard to
bolt this onto any compression scheme, probably.

As an encoder, you always have the option of using no state for encoding,
so all the other side can dictate is the maximum state you can use.

-=R


> 2. B needs to notify A that the compression context needs to be reset. If
> A gets that message before is constructs the second message, all is fine, A
> would just treat it like the initial response. If A already in the process
> of sending that message to B, B is going to have to reject it or put it on
> hold until the context is reconstructed... in which case B is going to need
> some way of knowing whether the message needs to be rejected. Also, there
> is a risk of too many reset messages being sent, causing a lot of churn.
>

> 3. B assigns a compression buffer window size of 0, effectively disabling
> the stored compression context (every message effectively becomes an
> initial message). The risk here is that a proxy might defensively always
> send a 0.
>
> My questions to the group are:
>
> A. Am I missing anything obvious here?
> B. Are there other possible options?
> C. Which option seems to be the least painful?
>
> - James
>
>
Received on Thursday, 17 January 2013 01:39:54 UTC