Re: Header Size? Was: Our Schedule from Greg Wilkins on 2014-05-31 (ietf-http-wg@w3.org from April to June 2014)

From: Greg Wilkins <gregw@intalio.com>
Date: Sat, 31 May 2014 10:15:45 +0200
To: Roberto Peon <grmocg@gmail.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAH_y2NHuVJ_zzoDHkjbxDfOo00m9Ykgy+nf8XC4o7TYRZVpqJQ@mail.gmail.com>
Roberto,

it is true that any shared state compression will be somewhat compromised
in the proxy situation that we are discussing.  However, hpack is
particularly compromised by its design that the emitted header set includes
precisely all the headers in the shared state that have not been
referenced.  Thus at the end of a frame, the shared state is 100% tailored
to the current stream.

In comparison, something like gzip can have a shared state that combines
compression state of multiple streams and the resulting compression will
gracefully degrade as more and more diverse streams are added.  It is
essentially blind to frame boundaries.

I can imagine variations of hpack that are not so compromised - ie by
requiring all emitted headers to be explicitly indexed.   This would allow
a shared state table to included headers from more than 1 stream.  Sure
management of that table would be a bit tricky, but then management of the
single table is already complex as the encoder has to pick a suitable
strategy and may get it wrong.

If such a shared frame boundary blind hpack state table was only mutated by
frames received on channel 0, that would free up the processing of other
streams so they could be done in parallel and out of order with respect to
each other.

If the shared table could only have headers added to in and not deleted
from it, then the processing of non-0 frame headers would be free from any
sequential access issues, but would instead have a size/growth issue
instead.   This could be addressed by simply having a fixed size, which
once hit compression was no longer dynamic, or by versioned tables that
could be deleted once all streams referencing them are complete.   None of
these things are particularly more complex than what we have today.

Consider the example of merging two connection onto a connection, both
coming from clients using large cookies.  With the current hpack, every
time the merged connection switches from a stream from one connection to
the other, it is going to have to removed the large cookies from the state
table and send the other connection cookies (uncompressed because they are
cookies).   This will be a rapid degradation in compression performance.
The natural burst nature of requests will probably help out for the case of
2 merged connections, but pretty soon with n connections the result will be
most streams having to carry it's own uncompressed cookie headers.

A shared table that is blind to frame boundaries, would instead be able to
hold the cookies from both connections.   Sure there is a table size issue
here as more and more connections are added and the table has to hold more
and more cookies.   But excluding the cookies from the share table does not
remove the requirement for the server to have to have them in memory and
make them available to the request handling.   In fact with the current
hpack, servers will end up with multiple copies of the same cookie in
memory as the merged connection adds/removes/adds/removes the same cookie
from the shared table multiple time.

Having a large shared table is going to have less memory impacts than a
smaller contested table.

I think it may well be worthwhile re-evaluating the proposals for multiple
contexts, as not only could these designs help with the proxy case, but
they could potentially free up the serialisation within receiver, giving a
more immediate benefit.

regards








On 31 May 2014 00:06, Roberto Peon <grmocg@gmail.com> wrote:

> Note that the shared-state problem is an issue with any compressor that
> shares state (it is almost definitional! :) ). The majority (if not all)
> compressors that would act on a per-frame basis offer limited benefit (e.g.
> huffman, which one can do today), or have potentially poor security
> properties.
>
> We have made many design concessions on behalf of proxies. I've driven
> many, if not most of them, as I've acutely felt the pain of operating
> proxies at large scale for years.. This particular design point (that of
> sharing one context per connection) was chosen because it cost/complexity
> of having multiple contexts was not assured to be outweighed by the
> benefit. The original, discarded (for the aforementioned reason), design
> had stream-groups which defined which compression and flow control context
> into which a stream would be binned.
>
> Alternately said, the issue is less a matter of HPACK specifically, and
> more a matter of how many compression contexts we wish to manage. The
> optimal thing for proxies is generally one context per server endpoint
> (e.g. per origin), or one context per client endpoint, but the complexity
> cost is higher.
>
> -=R
>
>
>
>
> On Fri, May 30, 2014 at 1:51 PM, Greg Wilkins <gregw@intalio.com> wrote:
>
>>
>> But Proxies will already work badly with the shared hpack table.  If the
>> different streams have significantly different header sets, then they will
>> be forever resetting the table or worse replacing each header one by one
>> (as I doubt many encoders will have the look ahead logic to guess if reset
>> or replace is best).
>>
>> So a single table on channel 0 will probably not be that different.   At
>> least a single static table on 0 would allow streams to be processed in
>> parallel and/or out of order.
>>
>> But I do think the proxy case is important and we should well support
>> aggregating streams.  hpack does not do that now, but a channel 0 mechanism
>> could be augmented with header set versions, so each stream would just
>> refer to which version it referred to.  This would well support proxies and
>> also allow arbitrary reordering and parallel execution of streams.
>>
>> In short, I think that considering proxies is an argument against hpack
>> rather than for it.
>>
>> cheers
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 30 May 2014 20:13, Roberto Peon <grmocg@gmail.com> wrote:
>>
>>> What he said :)
>>>
>>>
>>> On Fri, May 30, 2014 at 9:43 AM, Michael Sweet <msweet@apple.com> wrote:
>>>
>>>> Richard,
>>>>
>>>> I'm thinking from the proxy to the upstream server.  If you reuse the
>>>> same upstream connection for multiple downstream clients, it is quite
>>>> likely that they will not have the same User-Agent or other headers...
>>>>
>>>>
>>>> On May 30, 2014, at 12:06 PM, Richard Wheeldon (rwheeldo) <
>>>> rwheeldo@cisco.com> wrote:
>>>>
>>>>  Speaking as a proxy developer, I like the idea of putting common
>>>> header stuff onto frame 0. Common identity-specific stuff (user-agent) can
>>>> easily be shared. We already do similar things -(re-using state from
>>>> earlier requests on a keep alive connection. It’s a big win over re-sending,
>>>>
>>>>
>>>>
>>>> Richard
>>>>
>>>>
>>>>
>>>> *From:* Michael Sweet [mailto:msweet@apple.com <msweet@apple.com>]
>>>> *Sent:* 30 May 2014 09:14
>>>> *To:* Greg Wilkins
>>>> *Cc:* Matthew Kerwin; "Martin J. Dürst"; David Krauss; Martin Thomson;
>>>> Richard Wheeldon (rwheeldo); HTTP Working Group
>>>> *Subject:* Re: Header Size? Was: Our Schedule
>>>>
>>>>
>>>>
>>>> Greg,
>>>>
>>>>
>>>>
>>>> I don't see shared state like that working for proxies.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On May 30, 2014, at 4:22 AM, Greg Wilkins <gregw@intalio.com> wrote:
>>>>
>>>>
>>>>
>>>>    Matthew,
>>>>
>>>> firstly I'm sure there are forms of header compression that can have a
>>>> shared state table that are not so highly order dependent.  The problem
>>>> with hpack is that every field in every frame of stream can mutate the
>>>> shared state table.  This gives us a really hard serialisation problem, so
>>>> that setting the table size to zero does not help, as you still have to
>>>> prevent interleaving so you can decode in order and watch for an increase
>>>> in the table size.
>>>>
>>>> I think we could get a lot of benefit from a compression scheme that
>>>> uses header frames transmitted on channel 0 to set the shared state.   All
>>>> the user-agent guff could then be sent once and only once and all the
>>>> stream header decoding would then be read only (and thus could happen in
>>>> any order).    If you wanted to put cookies into the shared table, then
>>>> there are still some ordering issues, but not as hard as the current ones
>>>> and with lots of potential solutions (eg table versions or multiple table
>>>> ids etc.).
>>>>
>>>> To Martins idea,  that would help in some respects if the subsequent
>>>> frames are excluded from hpack.  This would let us allow interleaving.
>>>> However, it does not prevent the server from needing to hold onto large
>>>> header tables during request handling.  So we still should include headers
>>>> in the flow control, so the receiver can say "stop already!" when large
>>>> headers are being sent.
>>>>
>>>> cheers
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 30 May 2014 09:34, Matthew Kerwin <matthew@kerwin.net.au> wrote:
>>>>
>>>> On 30 May 2014 16:51, "Martin J. Dürst" <duerst@it.aoyama.ac.jp> wrote:
>>>>
>>>> This is just a thought:
>>>>
>>>> Would it be possible to allow arbitrarily large amounts of header data
>>>> (either via continuations or via multiple header frames), but to limit
>>>> compression to a single header frame.
>>>>
>>>> While in general, there is a stronger need to compress larger stuff,
>>>> such a solution could come with various benefits:
>>>> - Simplified compression (less/no state)
>>>> - Keep the main benefit (quick start)
>>>> - Penalty against large amounts of header data
>>>>   (because that's not the way to do things anyway)
>>>>
>>>> Regards,   Martin.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> If you send SETTINGS_HEADER_TABLE_SIZE=0 and a HEADERS with [0x30,0x20]
>>>> in the first block fragment you effectively disable the context, and are
>>>> left with only Huffman coding (which has a per-frame context).
>>>>
>>>>
>>>>
>>>> As Roberto reminded me yesterday, the thing about a header block is
>>>> that when it ends, you get everything else in the reference set (carried
>>>> over from the previous header block). The biggest gain in HPACK compression
>>>> comes from not actually sending identical headers again and again, which
>>>> means not only sharing context between multiple frames, but between frames
>>>> from multiple streams. I don't know if, in practice, any per-frame
>>>> compression scheme would come close to HPACK's connection-based delta
>>>> compression, and that would be a big hit to the protocol's appeal.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>   Matthew Kerwin
>>>>   http://matthew.kerwin.net.au/
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Greg Wilkins <gregw@intalio.com>
>>>> http://eclipse.org/jetty HTTP, SPDY, Websocket server and client that
>>>> scales
>>>> http://www.webtide.com  advice and support for jetty and cometd.
>>>>
>>>>
>>>>
>>>> _________________________________________________________
>>>> Michael Sweet, Senior Printing System Engineer, PWG Chair
>>>>
>>>>
>>>>
>>>>
>>>>  _________________________________________________________
>>>> Michael Sweet, Senior Printing System Engineer, PWG Chair
>>>>
>>>>
>>>
>>
>>
>> --
>> Greg Wilkins <gregw@intalio.com>
>> http://eclipse.org/jetty HTTP, SPDY, Websocket server and client that
>> scales
>> http://www.webtide.com  advice and support for jetty and cometd.
>>
>
>


-- 
Greg Wilkins <gregw@intalio.com>
http://eclipse.org/jetty HTTP, SPDY, Websocket server and client that scales
http://www.webtide.com  advice and support for jetty and cometd.
Received on Saturday, 31 May 2014 08:16:16 UTC