delta encoding and state management from James M Snell on 2013-01-17 (ietf-http-wg@w3.org from January to March 2013)

From: James M Snell <jasnell@gmail.com>
Date: Wed, 16 Jan 2013 17:23:36 -0800
To: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CABP7Rbf-_Of0Gnn7uaeuPiiZ6n+MxbpJjbggmD3qjykWX3gaXQ@mail.gmail.com>
Just continuing the investigation of delta... poking holes :-)

There is a potential issue with delta as currently designed. Assume we
 have a client A, a proxy B and a server C.

Let's assume the following default header dictionary:

  ID | Name | Value
 ----+------+-------
   1 | Foo  | A
   2 | Bar  |

A wants to send a message to C, the headers are:

  Foo: A
  Bar: B,C

The initial delta encoding would be:

  toggle(1)
  clone(2,B) => 3
  clone(2,C) => 4

B receives this message from A and forwards it on to C. C initializes it's
local copy of the compression context, storing the appropriate values. This
table has to be maintained from request-to-request, keeping header values
stored. The table on C becomes:

  ID | Name | Value
 ----+------+-------
   1 | Foo  | A
   2 | Bar  |
   3 | Bar  | B
   4 | Bar  | C

The active headers for the current request are pulled from the table [1,3,4]

A then wants to send another message to C, the headers are:

  Foo: A
  Bar: B,D

The delta encoding would be:

  toggle(4)
  clone(2,D) =>

B forwards this message on to C, which updates the context. The table on C
becomes:

  ID | Name | Value
 ----+------+-------
   1 | Foo  | A
   2 | Bar  |
   3 | Bar  | B
   4 | Bar  | C
   5 | Bar  | D

The active headers for the second request are pulled from the table
[1,3,5]. The table grows indefinitely as new header values are used within
the course of a single session. The complete value of the header would need
to be stored, highly variable values (cookies, timestamps, etc) would be
stored in full just in case they are reused later. (Roberto, please correct
me if I'm wrong on this point!)

So far, so good. Let's mess it up.

Let's assume that the connection between B and C dies before that second
request is sent to B, causing the compression context shared between B and
C to be reset and lost. Suddenly the delta encoding in the second message
becomes invalid, even though nothing has interrupted the connection between
A and B.

B and C need to have some way of reestablishing the compression context
after their connection is reestablished or the second message just becomes
nonsense.

At first review, there appear to be a few options:

1. B has to also maintain it's own copy of the complete header value table,
which could consume quite a bit of storage space on the proxy (relative to
current requirements). Once the connection is re-established, B would then
translate the delta-encoding in the second message to initialize the
reconstructed context on C. We currently do not have any metrics on just
how large this table could potentially become within the course of a single
session.

2. B needs to notify A that the compression context needs to be reset. If A
gets that message before is constructs the second message, all is fine, A
would just treat it like the initial response. If A already in the process
of sending that message to B, B is going to have to reject it or put it on
hold until the context is reconstructed... in which case B is going to need
some way of knowing whether the message needs to be rejected. Also, there
is a risk of too many reset messages being sent, causing a lot of churn.

3. B assigns a compression buffer window size of 0, effectively disabling
the stored compression context (every message effectively becomes an
initial message). The risk here is that a proxy might defensively always
send a 0.

My questions to the group are:

A. Am I missing anything obvious here?
B. Are there other possible options?
C. Which option seems to be the least painful?

- James
Received on Thursday, 17 January 2013 01:24:24 UTC