Re: #578: getting real-ish numbers for option 3

> On 24 Oct 2014, at 4:55 pm, Willy Tarreau <w@1wt.eu> wrote:
> 
> Hi Mark,
> 
> On Fri, Oct 24, 2014 at 02:20:23PM +1100, Mark Nottingham wrote:
>> I'm sensing a theme develop here, and it's entirely reasonable to say that we
>> shouldn't block for 2+ months for an optimisation without any supporting
>> data.
>> 
>> We already have an implementation of the proposal, so what we need now is a
>> test corpus.
>> 
>> That presents a problem, since it's highly subjective (extension headers are
>> very site-specific) and often fraught with privacy issues. 
>> 
>> What I'd suggest is that we agree upon an algorithm to synthesise some
>> headers, do so, and use one corpus to test both approaches.
>> 
>> Straw-man:
>> 
>> - 10000 request messages, using the same gzip context
>> - each request has five standard headers (:method :scheme :authority :path
>> user-agent)
>> - each standard header has a fixed value of an appropriate length
> 
> We should have a few variations on the value (eg: :path, accept, cookie).

Yup.

>> - each request has five custom headers, selected from a pool of ten custom
>> headers
>> - each custom header has a field-name between 10 and 20 characters long
>> - five of the custom headers has a field-value consisting of random data 40
>> characters long
>> - five of the custom headers have a fixed value 40 characters long
>> Thoughts? 
> 
> I think that's entirely reasonable but at the same time it contains the
> same default as what lead to this thread : we define a single model that
> looks average and we don't measure the variations when going to more or
> less standard cases, which explains why people with working implementations
> are more or less interested in a change depending on what they observe.
> 
> Thus I think that we should define 3 "models" to test in fact :
>  - the "average" one as you describe above
>  - the "browser" one with a single custom header out of the 10
>  - the "partner" one with 9 out of the 10 custom headers
> 
> That way we can see if one model shows an important deviation using one or
> another encoding. In my opinion, an adequate encoding (I mean a safe one
> for the future) should be reasonably good on all cases and show limited
> variations around the average model.
> 
> Once we're able to synthetize the requests for a given model, it's easy
> to build the two other ones, so  think it should be done.
> 
> Opinions ?

Sure, with the proviso that actually interpreting what's a useful difference is still undefined, and likely to cause some debate.

But let's go ahead and try, since the cost is relatively low. I'll write some Python this weekend (possibly tonight, subject to family stuff) to generate some header sets; if other folks can do the crunching code and have it ready, that'd be much appreciated.

Unless I hear otherwise, I'm going to do HTTP/1 style header sets separated by double-newlines; e.g.,

:scheme: https
:authority: foo.com
:path: /abc
foo: bar

:scheme: http
:authority: bar.com
:path: /def
baz: bat

and so on...

I'll put it in a repo for inspection / pulls. Whoever does the other code should as well.

Cheers,


--
Mark Nottingham   https://www.mnot.net/

Received on Friday, 24 October 2014 06:57:07 UTC