Re: HTTP Header Compaction Results

On Oct 24, 2012 2:27 AM, "Amos Jeffries" <squid3@treenet.co.nz> wrote:
>
> On 24/10/2012 6:26 p.m., Mark Nottingham wrote:
>>
>> Hi Herve,
>>
>> On 23/10/2012, at 3:26 AM, RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
wrote:
>>>
>>> Hi all,
>>
>> Welcome :)
>>
>>> We are currently studying the compaction results for different header
encoding formats, including various SPDY versions and an internally
developed format (we plan to publish its description as soon as possible).
>>
>> It's very good to hear that you're doing this.
>>
>>> We are wondering whether anyone is aware of a test corpus for HTTP
exchanges that would be available, or could be made available. This would
help us obtaining fair and realistic results.
>>
>>
>> I've been thinking about this too, and have started writing some
software to sniff unadulterated HTTP headers (request and response) off the
wire. I know that some browsers make headers available, but they also have
a habit of "cleaning up" the headers before presenting them to APIs, IME.
>>
>> Currently I have a pcap-based sniffer for HTTP sites <
https://github.com/mnot/hdrgrab>; soon I should have a MITM proxy for HTTPS
ones.

I believe that browsers using nss can be compiled to dump the TLS private
keys, and that wireshark has been patched to use them. Just another option
in case MITM proves difficult in some unknown way. It has been of great
benefit to me when debugging, regardless...

>>
>> Once we start to collect a corpus, I'd suggest we put them on github: <
https://github.com/http2>. That will let everyone share a common base for
testing, and review it to make sure that it's appropriate.
>
>
> FYI;  we have implemented raw HTTP header dumps on inbound/outbound
traffic in Squid-3.2 linked with TCP socket details and timing. If anyone
else is interested in a quick and easy source of data without tcpdump they
are welcome to use Squid for that. "debug_options 11,2" is the magic config
stanza to output the traffic into cache.log.
>
>
>> If you'd like to put your framework for comparison up there, you'd be
welcome to; please ping me.
>>
>>> Currently we are using web pages obtained from Alexa's top ranked
sites. While computing our results, we are taking into account several
parameters: separate study of requests and responses, using deflate or not
(when relevant), variable number of exchanged messages (corresponding to
loading from 1 to 10 web pages). Any advice on these measurements would be
appreciated.
>>
>> For the time being, mine combines all of the requests / responses on
different connections to the same server into the same file, under the
assumption that multiplexing will enable this. Eventually I'd like to get
something more sophisticated in there.
>>
>> It also removes connection headers, as they're really HTTP/1 specific.
>>
>> What else should we be doing?
>
>
> I think the HTTP/1 traces should leave those headers in. The data corpus
might be used as example raw input from HTTP/1 clients or servers to test
upgrading speeds on middleware. Which will need to also test time taken
stripping those headers away.
>
> The data corpus can come with an example HTTP/1->HTTP/2 convert script to
prepare for tests of HTTP/2 native traffic speeds and/or of speed testing
implementations gateway conversions.

++

If we do end up with an awesome common test corpus , let's have the data
unadulterated (reformatting doesn't count as adulteration)

-=R

>
>
>
>>
>>> Last, we would like to share our results to help the WG discussions on
this topic. As we're rather new to the IETF, we're not sure of the best way
to do this. I plan to attend the IEFT 85 meeting in Atlanta, so I could use
this occasion to share these results.
>>
>> If you'd like to give a short (10-20 minute) presentation in the Atlanta
meeting, I'd be happy to accommodate that. Please ping me if you're
interested.
>>
>
> Amos
>

Received on Wednesday, 24 October 2012 17:06:24 UTC