Re: hpack static table question? from Greg Wilkins on 2014-06-03 (ietf-http-wg@w3.org from April to June 2014)

From: Greg Wilkins <gregw@intalio.com>
Date: Tue, 3 Jun 2014 17:46:48 +0200
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAH_y2NEE+cicjyBR=r8bLjpgAK-M_f+epEO8so3o3+0o18aCbA@mail.gmail.com>
... and while I'm on a roll with my sweating of the small stuff in hpack....

The forever changing index's of common header fields makes encoding like
playing whack-a-mole and takes away some good optimisations available in
the server.

For example, if you know that several 10s of thousands of times in the next
1000ms you are going to send headers like:

  Date: Mon, 21 Oct 2013 20:13:22 GMT
  Server: Jetty/9.2.0
  Accept-Ranges: bytes

then you don't naively format the date and generate the exact bytes for
each and every response header.  You pre-generate the bytes for each field
and just to a block array copy to each response that needs them, then
regenerate the byte array for the data just 1 per second for all responses.

But with hpack, you can't do that because even though the huffman encoding
is always the same, the actual index of the header names will be constantly
changing and different for each connection. It is even moderately expensive
to just lookup to see if these headers are in the reference set and then if
they are in the header table so you know if you should use a dynamic index
or the static one, then you have to calculate the static one.

So currently I'm thinking that jetty wont use the index names for these
headers and send them in the more verbose literal form (which probably will
generate more garbage in the client - but hey not my problem:)

Hopefully such fields will be able to mostly live in the reference set, so
repetitions of sending them on the same connection can be limited - but I
was hoping to use that freed up CPU and bandwidth to server more
connections!

I'm really not feeling the luv'n for this copy the static field
whack-a-mole changing index thang!

It is implementable, but is it really worth it to limit the size of
something that was already size limited?

cheers











On 3 June 2014 16:30, Greg Wilkins <gregw@intalio.com> wrote:

>
> Sorry more (hopefully not dumb) questions on the static table.
>
> There appears to be no way to send an static indexed header field without
> adding to the reference set.  So to actually use one of the following
> static fields:
>
>           | 2     | :method                     | GET          |
>           | 3     | :method                     | POST         |
>           | 4     | :path                       | /            |
>           | 5     | :path                       | /index.html  |
>           | 6     | :scheme                     | http         |
>           | 7     | :scheme                     | https        |
>           | 8     | :status                     | 200          |
>           | 9     | :status                     | 204          |
>           | 10    | :status                     | 206          |
>           | 11    | :status                     | 304          |
>           | 12    | :status                     | 400          |
>           | 13    | :status                     | 404          |
>           | 14    | :status                     | 500          |
>
>
> it is a bit of a rigmarole:
>
>    - You firstly send it as a indexed reference to the static table.
>    - This is copied to the dynamic table, probably evicting an entry from
>    the header table and possibly from the reference set.
>    - The entry is then added to the reference set.
>    - On the next message, you then very likely need to remove it (because
>    one would hope that all your requests are not for /index.html nor all your
>    responses 404s), so you have to send an indexed request again, this time to
>    the index it currently has in the dynamic table.
>    - Finally, you potentially now need to resend the header that was
>    evicted from the headerset and reference table to get back to where you
>    were... oh but dang that now evicts something else and so on and so forth
>    until the unwanted 404 is eventually evicted!
>
>
> Am I missing something again?
>
> I'm thinking that if we really have to copy static entries into the header
> table, then we need to be able to emit a static field without having it
> included in the reference set.   Otherwise instead of being a single octet
> to send a common status code, we are going to have to send at least 2 but
> probably many more: as if the header table is >128 then static indexes will
> not fit in a single byte and we'll also have to replace entries that have
> been needlessly evicted.
>
> Actually for status's like 404, it is better to send them as literal
> fields never indexed with an indexed name.   To send :status:404 this way
> is just (assuming empty header table):
>
>   0x18 // literal never indexed, with static index 8==status:
>   0x82 // huffman encoded 2 byte value
>   0x8020 // 404 huffman encoded
>
> So 4 bytes, but no evictions from the header table.   This could be 5
> bytes if the header table is large and thus the static index is >128  (why
> aren't the static index's 1-61???)
>
>
>
>
> On 2 June 2014 22:34, Greg Wilkins <gregw@intalio.com> wrote:
>
>>
>> Roberto
>>
>> Thanks for sticking with me - I've got it now!
>>
>> What I was missing was that a reference to an entry already in the
>> reference set removes it rather than duplicates it!
>>
>> It's not the most obvious of emergent behaviours - at least not to this
>> little black duck, and seams a bit unnecessary as there are only 13 static
>> entries with values, so allowing those into the reference set without
>> copying is hardly going to use much memory.
>>
>> I'd prefer to just allow references to the static table and as I said
>> before, the limit on the header table size would naturally put a limit on
>> the reference set (even more so, now I've understood why there are no
>> duplicates).  But then it's not a big deal either.... if we end up changing
>> hpack for other reasons, then I'd advocate dropping the copy.
>> thanks again!
>>
>>
>>
>>
>>
>>
>>
>>
>> On 2 June 2014 21:47, Roberto Peon <grmocg@gmail.com> wrote:
>>
>>> Lemme try to explain the whole thing :)
>>> 1) Every header table entry has an 'overhead' associated with it, thus
>>> even zero-length strings have size.
>>> 2) Every entry always points to a header table entry (thus,has a
>>> cost-in-bytes associated)
>>> 3) Every entry in the reference set is unique-- there is no way to have
>>> a duplicate in there.
>>>
>>> In your example:
>>> index 4 becomes a reference to the first entry in the table.
>>> Each duplicate reference causes the previous first entry to be evicted
>>> and replaced.
>>> Lets assume that the max-table-size is 42 here.
>>> Since the cost-in-bytes of the entry are (strlen(":path") + strlen("/")
>>> + 32) == 38, one entry fits into the table, but two do not. Referencing the
>>> static index again will cause the previous entry in the table to be evicted.
>>> Referencing the header-table entry will cause it to be removed if it is
>>> already present, and added if it isn't.
>>> Any time an entry is removed from the header table, the reference to it
>>> in the reference set must be removed (can't have dangling references!).
>>>
>>> -=R
>>>
>>>
>>> On Mon, Jun 2, 2014 at 12:18 PM, Greg Wilkins <gregw@intalio.com> wrote:
>>>
>>>>
>>>> Roberto,
>>>>
>>>> that clause from 3.1.3 is clear enough in what an impl must do, I just
>>>> don't see how it achieve a limit on the reference set size.
>>>>
>>>> Consider a setup that has a small header table size that will fit just
>>>> a single field into in.  This decoder then receives a header frame that
>>>> contains a reference to header 4 (static :path:/).  This is copied into the
>>>> header table at index 1 (evicting anything else that was in there) and is
>>>> added to the reference set.  Now say that the rest of the header frame is
>>>> full of many many duplicates of a reference to index 1.  For each reference
>>>> another entry is made into the reference set pointing to the copied static
>>>> entry.    This can continue for ever and represents unlimited growth of the
>>>> reference set.
>>>>
>>>> If this kind of duplicate attack is not a problem, then I don't think
>>>> we need to limit the size of the reference set, because without such
>>>> duplicates, then every entry in the reference set is going to be much
>>>> smaller than each entry in the header set.   Thus a limit on the header set
>>>> size is effectively a limit on the reference set size, without the need to
>>>> copy.
>>>>
>>>> So either I'm still missing something or this a complex mechanism that
>>>> does not achieve what it is intended to do.
>>>>
>>>> cheers
>>>>
>>>>
>>>> --
>>>> Greg Wilkins <gregw@intalio.com>
>>>> http://eclipse.org/jetty HTTP, SPDY, Websocket server and client that
>>>> scales
>>>> http://www.webtide.com  advice and support for jetty and cometd.
>>>>
>>>
>>>
>>
>>
>> --
>> Greg Wilkins <gregw@intalio.com>
>> http://eclipse.org/jetty HTTP, SPDY, Websocket server and client that
>> scales
>> http://www.webtide.com  advice and support for jetty and cometd.
>>
>
>
>
> --
> Greg Wilkins <gregw@intalio.com>
> http://eclipse.org/jetty HTTP, SPDY, Websocket server and client that
> scales
> http://www.webtide.com  advice and support for jetty and cometd.
>



-- 
Greg Wilkins <gregw@intalio.com>
http://eclipse.org/jetty HTTP, SPDY, Websocket server and client that scales
http://www.webtide.com  advice and support for jetty and cometd.
Received on Tuesday, 3 June 2014 15:47:19 UTC