- From: Willy Tarreau <w@1wt.eu>
- Date: Fri, 18 Jul 2014 07:49:10 +0200
- To: Greg Wilkins <gregw@intalio.com>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
Hi Greg, On Fri, Jul 18, 2014 at 10:13:09AM +1000, Greg Wilkins wrote: > Willy, > > I see what you are saying. However my experience with hpack is that > fiddling the bit encodings is going to give you +/- <1% difference. In fact I'm not trying to save much, I feel concerned by the fact that with static table below the dynamic one, we'll always require one extra byte for any litteral whose name isn't in the static table. And switching the two very likely changes the importance of the header fields that were put in the static table, because initially it was not *that* important, mostly for the first request, now it's for all requests. Since most header fields appear to be there in the static table, it's not a big deal, but I'd prefer that we ensure we don't miss any. For example, some browsers send "TE". Here it's not present in the static table, probably because it was not worth consuming an entry for a 2-byte name which would end up in the dynamic table after the first reqeust. But now, referencing it from the dynamic table will systematically require one extra byte if the value changes. I gave the example of the XFF header which should not be an issue over the links where byte count matters though. > I can make +/- >5% differences by picking different encoding strategies and > see similar differences over different stories over the test data. > > So I think tweaking the encodings at this stage is really just operating > within the noise of the different headers. So I don't think we should > change hpack in this way. We just don't have the data to optimise the last > 1% nor do we know if natural variation is such that it is pointless to try > to find a one-size-pefectly-fit-all solution. I'm really not fond of changing it either because I think it's properly designed. But if we change the fondamental principle of the dynamic table containing any recently emitted header and with a low index, this changes the index distribution. > Removing RefSet is definitely a good thing to do. > > Removing the copy of static entries to the dynamic table make measurable > impact on compression, so I think we should remove it for simplicity and to > save CPU - it only being added because of the RefSet. > > With those removed, I'm 90% happy with hpack > > Once you remove the static copy, I do think there is an argument to be made > to revert the static table to below the dynamic table. This is for > simplicity and because it will allow the common field indexes to be > precomputed as a single byte, saving CPU. I know and am not against this, quite the opposite in fact. You may remember that two years ago I was discussing if we shouldn't encode common headers as a single byte or so :-) > It does not appear to affect the compression efficiency in any significant > way. That's the point where I think we should be more careful and ensure we have enough relevant data (eg: some mobile browsers requests captured before the operators' transparent proxies). If at least we had the ability to encode both static table and most recent dynamic entries with a single index, I would feel better. For example you can have another approach : - positive indexes = static table index - negative indexes = - dynamic table index You encode (index + 10), so that you can encode up to the last 10 emitted fields, and you can encode up to 54 static headers in a single byte (it's just a matter of proper sorting but it's already reasonably clean). I think you get the idea. > With this I'd be 95% happy. > > Encoding dates as integers does appear to give some additional > compression. But apparently it has been proposed and rejected before by > the WG. So while I think it would be good, I'm not going to advocate > that we revisit at this stage. Same here, it's a desirable optimization but if we go down that route, we'll continue with specific integer encoding for content-length and accept-ranges, then we'll start to suggest that we encode all common content-type tokens as a few bits, etc... And we're back to redefining the whole encoder. So I'd rather avoid touching it in this regard. BTW, I think that with the data you currently have and with your encoder, you could check how often a literal refers to the dynamic table in your data set, which will be a good indication of the relevance or not of my concern above. Thanks, Willy
Received on Friday, 18 July 2014 05:49:34 UTC