Re: HTTP Header Compaction Results from Amos Jeffries on 2012-10-24 (ietf-http-wg@w3.org from October to December 2012)

From: Amos Jeffries <squid3@treenet.co.nz>
Date: Wed, 24 Oct 2012 22:25:39 +1300
To: ietf-http-wg@w3.org
Message-ID: <5087B413.1000801@treenet.co.nz>
On 24/10/2012 6:26 p.m., Mark Nottingham wrote:
> Hi Herve,
>
> On 23/10/2012, at 3:26 AM, RUELLAN Herve <Herve.Ruellan@crf.canon.fr> wrote:
>> Hi all,
> Welcome :)
>
>> We are currently studying the compaction results for different header encoding formats, including various SPDY versions and an internally developed format (we plan to publish its description as soon as possible).
> It's very good to hear that you're doing this.
>
>> We are wondering whether anyone is aware of a test corpus for HTTP exchanges that would be available, or could be made available. This would help us obtaining fair and realistic results.
>
> I've been thinking about this too, and have started writing some software to sniff unadulterated HTTP headers (request and response) off the wire. I know that some browsers make headers available, but they also have a habit of "cleaning up" the headers before presenting them to APIs, IME.
>
> Currently I have a pcap-based sniffer for HTTP sites <https://github.com/mnot/hdrgrab>; soon I should have a MITM proxy for HTTPS ones.
>
> Once we start to collect a corpus, I'd suggest we put them on github: <https://github.com/http2>. That will let everyone share a common base for testing, and review it to make sure that it's appropriate.

FYI;  we have implemented raw HTTP header dumps on inbound/outbound 
traffic in Squid-3.2 linked with TCP socket details and timing. If 
anyone else is interested in a quick and easy source of data without 
tcpdump they are welcome to use Squid for that. "debug_options 11,2" is 
the magic config stanza to output the traffic into cache.log.

> If you'd like to put your framework for comparison up there, you'd be welcome to; please ping me.
>
>> Currently we are using web pages obtained from Alexa's top ranked sites. While computing our results, we are taking into account several parameters: separate study of requests and responses, using deflate or not (when relevant), variable number of exchanged messages (corresponding to loading from 1 to 10 web pages). Any advice on these measurements would be appreciated.
> For the time being, mine combines all of the requests / responses on different connections to the same server into the same file, under the assumption that multiplexing will enable this. Eventually I'd like to get something more sophisticated in there.
>
> It also removes connection headers, as they're really HTTP/1 specific.
>
> What else should we be doing?

I think the HTTP/1 traces should leave those headers in. The data corpus 
might be used as example raw input from HTTP/1 clients or servers to 
test upgrading speeds on middleware. Which will need to also test time 
taken stripping those headers away.

The data corpus can come with an example HTTP/1->HTTP/2 convert script 
to prepare for tests of HTTP/2 native traffic speeds and/or of speed 
testing implementations gateway conversions.


>
>> Last, we would like to share our results to help the WG discussions on this topic. As we're rather new to the IETF, we're not sure of the best way to do this. I plan to attend the IEFT 85 meeting in Atlanta, so I could use this occasion to share these results.
> If you'd like to give a short (10-20 minute) presentation in the Atlanta meeting, I'd be happy to accommodate that. Please ping me if you're interested.
>

Amos
Received on Wednesday, 24 October 2012 09:26:12 UTC