Re: Header Table and Static Table Indicies Switched from Greg Wilkins on 2014-08-02 (ietf-http-wg@w3.org from July to September 2014)

From: Greg Wilkins <gregw@intalio.com>
Date: Sat, 2 Aug 2014 12:22:52 +1000
To: Roberto Peon <grmocg@gmail.com>
Cc: Jeff Pinner <jpinner@twitter.com>, Jason Greene <jason.greene@redhat.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAH_y2NFF4nmTWAsg+sxgC5=QkrVDcDkpGQJ=SHG6Dkh0RbJ3bA@mail.gmail.com>

On 2 August 2014 10:53, Roberto Peon <grmocg@gmail.com> wrote:

> I don't recall that particular change being something I saw consensus
> about, and certainly something that the discussion (at least AFAICT) didn't
> have a resolution for.
>
I think that this was a detail left to the editor once consensus was
declared on the removal of the reference set.  Like dropping the copy
etc.   There was discussion about which index was best to be low and I
provided some data on the public address set, but in the end it was the
editors call I think.

I think that the change is likely to decrease compressor efficiency, and it
> *certainly* will for any algorithm which searches the state context first
> to see if there are any exact matches.
>
Can we move beyond "I think".  This is something for which it is possible
to achieve real numbers.  This was done of a publicly available data set
that had been used for the basis of removing the RefSet in the first place,
and it indicated very little change either way with regards to the index.


I'll revert it as soon as possible.
>
ummmm  do you have the authority to do that?


more detail replies below

In order to know which index one must use, one must scan the entire table
> (didn't used to be the case with the reference set, but, that is gone) and
> look for matches for each header. If one scans the static table first, then
> one very likely wastes CPU. One can optimize a bit by only dong so if the
> length of the value is < max_static_table_value_length
>
Hash don't scan!

For my own implementation there is no lookup difference between having the
static indexes low or high.  But there is a benefit of having the static
headers at fixed index's as I can pre-generate the bytes.

The new approach affords increased efficiency to the first request, at the
> cost of decreased efficiency to any subsequent request.
>

Can you back up that assertion with any real numbers?   It is certainly not
the case for my own implementation as it does hash lookups for fields and
then names, so size of table nor length of index are factors in neither.
You need to create at least 65 unique fields before there is any additional
data cost, which I would then suggest is a tiny fraction of the cost of
sending 65 unique fields in the first place.

But show me a data set that it a good general case that indicates having
the indexes the other way around is better and if we are to make any
breaking changes after -14 then I'll support swapping back.

> Actually, it doesn't afford any efficiency to the first request.
> It affords more efficiency IFF more replacement happens than referencing
> and replacement isn't done from the header table.
>
Again, can you show any actual numbers?   For my own implementation, having
the indexes low is a marginal improvement (same lookups but less
branching).  In terms of data efficiency you need 65 custom indexed fields
before it makes any difference.

numbers please.






-- 
Greg Wilkins <gregw@intalio.com>
http://eclipse.org/jetty HTTP, SPDY, Websocket server and client that scales
http://www.webtide.com  advice and support for jetty and cometd.

Received on Saturday, 2 August 2014 02:23:21 UTC