Re: delta encoding and state management from James M Snell on 2013-01-17 (ietf-http-wg@w3.org from January to March 2013)

From: James M Snell <jasnell@gmail.com>
Date: Thu, 17 Jan 2013 09:21:11 -0800
To: Roberto Peon <grmocg@gmail.com>
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CABP7RbcDRpfwfTOGS_aNjG4LtWkabvrGYcfwKqajT3hGKzBO6Q@mail.gmail.com>
My main concerns with this approach are (a) even tho the decoder gets to
set limits on the amount of state used, keeping that state around is still
relatively expensive compared to what we have now, and (b) keeping it in
sync with the client, server and any number of intermediaries along the
path is likely going to prove difficult at best. We need to make sure we
have a good understanding of the worst case scenario with this approach
(i.e. nothing stored in context anywhere along the path).

The other concern is that this compression context would likely end up
storing potentially sensitive information for an indefinite period of time.
WWW-Authenticate and Cookie headers, API Keys, etc would all end up being
stored with (currently) no guarantees about security. Always using erefs
for those would headers would work but the current description you have
rules out user-agent use of erefs [1]. If I'm passing secure credentials
around in a header, I don't want those stored anywhere, especially on
intermediaries.

  [1]
http://tools.ietf.org/html/draft-rpeon-httpbis-header-compression-00#section-4

- James

On Wed, Jan 16, 2013 at 5:39 PM, Roberto Peon <grmocg@gmail.com> wrote:

>
>
>
> On Wed, Jan 16, 2013 at 5:23 PM, James M Snell <jasnell@gmail.com> wrote:
>
>> Just continuing the investigation of delta... poking holes :-)
>>
>> There is a potential issue with delta as currently designed. Assume we
>>  have a client A, a proxy B and a server C.
>>
>> Let's assume the following default header dictionary:
>>
>>   ID | Name | Value
>>  ----+------+-------
>>    1 | Foo  | A
>>    2 | Bar  |
>>
>> A wants to send a message to C, the headers are:
>>
>>   Foo: A
>>   Bar: B,C
>>
>> The initial delta encoding would be:
>>
>>   toggle(1)
>>   clone(2,B) => 3
>>   clone(2,C) => 4
>>
>> B receives this message from A and forwards it on to C. C initializes
>> it's local copy of the compression context, storing the appropriate values.
>> This table has to be maintained from request-to-request, keeping header
>> values stored. The table on C becomes:
>>
>>   ID | Name | Value
>>  ----+------+-------
>>    1 | Foo  | A
>>    2 | Bar  |
>>    3 | Bar  | B
>>    4 | Bar  | C
>>
>> The active headers for the current request are pulled from the table
>> [1,3,4]
>>
>> A then wants to send another message to C, the headers are:
>>
>>   Foo: A
>>   Bar: B,D
>>
>> The delta encoding would be:
>>
>>   toggle(4)
>>   clone(2,D) =>
>>
>> B forwards this message on to C, which updates the context. The table on
>> C becomes:
>>
>>   ID | Name | Value
>>  ----+------+-------
>>    1 | Foo  | A
>>    2 | Bar  |
>>    3 | Bar  | B
>>    4 | Bar  | C
>>    5 | Bar  | D
>>
>> The active headers for the second request are pulled from the table
>> [1,3,5]. The table grows indefinitely as new header values are used within
>> the course of a single session. The complete value of the header would need
>> to be stored, highly variable values (cookies, timestamps, etc) would be
>> stored in full just in case they are reused later. (Roberto, please correct
>> me if I'm wrong on this point!)
>>
>>
> Correct, if you're doing compression, you'll need the compression state. A
> proxy would need to keep this around at a minimum so it'd know what hosts,
> methods, etc. were being used so as to appropriately route a query (else
> you could just read from one TCP hose and write into the other and who
> cares about the compression when you can do that).
>
>
>> So far, so good. Let's mess it up.
>>
>> Let's assume that the connection between B and C dies before that second
>> request is sent to B, causing the compression context shared between B and
>> C to be reset and lost. Suddenly the delta encoding in the second message
>> becomes invalid, even though nothing has interrupted the connection between
>> A and B.
>>
>> B and C need to have some way of reestablishing the compression context
>> after their connection is reestablished or the second message just becomes
>> nonsense.
>>
>> At first review, there appear to be a few options:
>>
>> 1. B has to also maintain it's own copy of the complete header value
>> table, which could consume quite a bit of storage space on the proxy
>> (relative to current requirements). Once the connection is re-established,
>> B would then translate the delta-encoding in the second message to
>> initialize the reconstructed context on C. We currently do not have any
>> metrics on just how large this table could potentially become within the
>> course of a single session.
>>
>>
> Well, current requirements are either no compression (HTTP/1), or use a
> encoder-defined amount of memory at the decoder (gzip in SPDY).
> The idea here with delta is to give the decoder more keys to the castle--
> the decoder (not the encoder!) dictates how much state it is willing to
> maintain and the encoder must stick within that. It wouldn't be too hard to
> bolt this onto any compression scheme, probably.
>
> As an encoder, you always have the option of using no state for encoding,
> so all the other side can dictate is the maximum state you can use.
>
> -=R
>
>
>> 2. B needs to notify A that the compression context needs to be reset. If
>> A gets that message before is constructs the second message, all is fine, A
>> would just treat it like the initial response. If A already in the process
>> of sending that message to B, B is going to have to reject it or put it on
>> hold until the context is reconstructed... in which case B is going to need
>> some way of knowing whether the message needs to be rejected. Also, there
>> is a risk of too many reset messages being sent, causing a lot of churn.
>>
>
>> 3. B assigns a compression buffer window size of 0, effectively disabling
>> the stored compression context (every message effectively becomes an
>> initial message). The risk here is that a proxy might defensively always
>> send a 0.
>>
>> My questions to the group are:
>>
>> A. Am I missing anything obvious here?
>> B. Are there other possible options?
>> C. Which option seems to be the least painful?
>>
>> - James
>>
>>
>
Received on Thursday, 17 January 2013 17:21:58 UTC