Re: delta encoding and state management from ??? on 2013-01-23 (ietf-http-wg@w3.org from January to March 2013)

From: ??? <willchan@chromium.org>
Date: Tue, 22 Jan 2013 16:51:29 -0800
To: Willy Tarreau <w@1wt.eu>
Cc: James M Snell <jasnell@gmail.com>, Nico Williams <nico@cryptonector.com>, Roberto Peon <grmocg@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CAA4WUYjGiA0WP6o3ub5ZPTh-zYZ9Jth6w2GfuMT+WarT69GW-A@mail.gmail.com>
On Tue, Jan 22, 2013 at 4:00 PM, Willy Tarreau <w@1wt.eu> wrote:
> On Tue, Jan 22, 2013 at 03:08:08PM -0800, William Chan (?????????) wrote:
>> >> How long do you delay
>> >> the resource request in order to consolidate requests into a load
>> >> group? The same thing is even more true for response headers.
>> >
>> > I never want to delay anything, delays only do bad things when we
>> > try to reduce latency.
>>
>> One of us has the wrong mental model for how the proposal would work.
>> Let's figure this out.
>>
>> Let's say the browser requests foo.html. It receives a response packet
>> for foo.html, referencing 1.js. 5ms later, it receives packet 2 for
>> foo.html which references 2.js. 5ms it receives packet 3 for foo.html
>> which references 3.js. And so on. You say no delays. So does this mean
>> each "group" only includes one object each time?
>
> Ah OK I didn't understand. My assumption was that browsers do have a list
> of objects to be fetched, but with what you're explaining, it might not
> always be true. Anyway the principle I proposed suggested that all subsequent
> requests remain in the same group until a new group is emitted, so that
> should cover the need for new objects that are discovered one at a time.
> However, I do think (but may be wrong) that objects are not often scheduled
> to go on the wire one at a time, but that when many objects appear in the
> contents, many of them are seen together.
>
>> And now let's ignore the 5ms delays. Consider how WebKit works. Let's
>> say WebKit has all of foo.html. It starts parsing it. It encounters
>> 1.js. It immediately sends the resource request to the network stack.
>> It hasn't parsed the full document yet, so it doesn't know if it'll
>> encounter any more resources. Each time it encounters a resource while
>> parsing the document, it will send it to the network stack (in
>> Chromium and latest versions of Safari, this is a separate process).
>
> I must say I'm a bit shocked by this behaviour which is very inefficient
> from a TCP point of view. This means you have two possibilities for sending
> your requests then :
>   - either you keep Nagle enabled and your requests wait in the kernel's stack
>     for some time (typically 40 ms) before leaving, even if the request is
>     the last one ;
>
>   - or you disable Nagle to force them to leave immediately, but then each
>     request leaves with a TCP push flag, and then your TCP stack will not
>     send anything else over the same socket for a full RTT (until its pending
>     data are ACKed), which is worse.

We disable Nagle on our sockets. I must be missing something. Why
would the TCP stack only send one packet per roundtrip when you
disable Nagle? I do not believe TCP stacks only allow one packet's
worth of unack'd data.

>
> This is why we generally try to fill packets over the wire as much as
> possible. An alternative consists in opening many connections but this
> is not efficient either then (RTTs, upstream packets).
>
> So in practice I suspect that you already send requests with Nagle enabled
> and disable it when you reach the end of the page, so that whatever can leave
> is delayed at most 40ms and never more than the time to parse the whole page.
> If this is the case, then you already have your requests delayed by as much
> as 40ms and sent as groups.

No. See https://code.google.com/searchframe#OAMlx_jo-ck/src/net/socket/tcp_client_socket_win.cc&exact_package=chromium&q=nagle&type=cs&l=117.

>
>> What is the network stack to do if, as you say, it should never delay
>> anything? If I understand correctly, each "group" would always only
>> include one object then.
>
> I did not understand you meant delay between objects while parsing, I
> thought you meant delay between groups.

Indeed, there's a delay between objects. There must be. As previously
stated, we have no means of predicting the future. We do not know that
there are more objects in the document yet to be parsed. When we
encounter an object during parsing, we request it immediately.

>
> Here you're limited by TCP. If you push too fast, you have to wait one RTT
> between requests. If you ask the kernel to disable quick ACK or if you keep
> NAGLE enabled (using TCP_CORK, MSG_MORE, etc...), your requests will
> automatically leave between 40 and 200ms even if incomplete (far too much).
>
> However, considering that only incomplete packets will remain pending
> for the time it takes to parse the page and will leave anyway if it takes
> longer than that, I think it remains optimal to feed the kernel's buffers
> and let the first of the kernel or the HTML parser decide to send incomplete
> segments. Otherwise you'd delay subsequent requests by an RTT in the TCP
> stack.
>
>> > In the example I proposed, the recipient receives the full headers
>> > block, then from that point, all requests reuse the same headers
>> > and can be processed immediately (just like pipelining in fact).
>> >
>> > Concerning response headers, I'd say that you emit a first response
>> > group with the headers from the first response, followed by the
>> > response. When another response comes in, you have two possibilities,
>> > either it shares the same headers and you can add a response to the
>> > existing group, or it does not and you open a new group.
>>
>> Wait, is this the critical misunderstanding? Are you maintaining state
>> across requests and responses? Isn't this a minor modification on the
>> "simple" compressor? I was assuming you were trying to be stateless.
>
> I'm having a hard time following you, I'm sorry. What state across requests
> and responses do you mean ? The only "state" I'm talking about is the list
> of common headers between the current message and the previous one in fact.
> This is true both for requests and responses.

Yes, this is what I refer to as "simple" compression, as coined by
Mark (see http://www.mnot.net/blog/2013/01/04/http2_header_compression):
"""
“simple” - Omitting headers repeated from the last message, tokenising
common field names, and a few other tweaks. Otherwise, it looks like
HTTP/1.
"""

I consider this a form of stateful compression. If you are fine with
that, then great, I think we've made significant progress in our
discussions here, and it's a good starting point for discussing
connection state requirements.

>
> Regards,
> Willy
>
Received on Wednesday, 23 January 2013 00:51:57 UTC