Re: delta encoding and state management from Willy Tarreau on 2013-01-23 (ietf-http-wg@w3.org from January to March 2013)

From: Willy Tarreau <w@1wt.eu>
Date: Wed, 23 Jan 2013 01:00:18 +0100
To: "William Chan (?????????)" <willchan@chromium.org>
Cc: James M Snell <jasnell@gmail.com>, Nico Williams <nico@cryptonector.com>, Roberto Peon <grmocg@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <20130123000018.GR30692@1wt.eu>
On Tue, Jan 22, 2013 at 03:08:08PM -0800, William Chan (?????????) wrote:
> >> How long do you delay
> >> the resource request in order to consolidate requests into a load
> >> group? The same thing is even more true for response headers.
> >
> > I never want to delay anything, delays only do bad things when we
> > try to reduce latency.
> 
> One of us has the wrong mental model for how the proposal would work.
> Let's figure this out.
> 
> Let's say the browser requests foo.html. It receives a response packet
> for foo.html, referencing 1.js. 5ms later, it receives packet 2 for
> foo.html which references 2.js. 5ms it receives packet 3 for foo.html
> which references 3.js. And so on. You say no delays. So does this mean
> each "group" only includes one object each time?

Ah OK I didn't understand. My assumption was that browsers do have a list
of objects to be fetched, but with what you're explaining, it might not
always be true. Anyway the principle I proposed suggested that all subsequent
requests remain in the same group until a new group is emitted, so that
should cover the need for new objects that are discovered one at a time.
However, I do think (but may be wrong) that objects are not often scheduled
to go on the wire one at a time, but that when many objects appear in the
contents, many of them are seen together.

> And now let's ignore the 5ms delays. Consider how WebKit works. Let's
> say WebKit has all of foo.html. It starts parsing it. It encounters
> 1.js. It immediately sends the resource request to the network stack.
> It hasn't parsed the full document yet, so it doesn't know if it'll
> encounter any more resources. Each time it encounters a resource while
> parsing the document, it will send it to the network stack (in
> Chromium and latest versions of Safari, this is a separate process).

I must say I'm a bit shocked by this behaviour which is very inefficient
from a TCP point of view. This means you have two possibilities for sending
your requests then :
  - either you keep Nagle enabled and your requests wait in the kernel's stack
    for some time (typically 40 ms) before leaving, even if the request is
    the last one ;

  - or you disable Nagle to force them to leave immediately, but then each
    request leaves with a TCP push flag, and then your TCP stack will not
    send anything else over the same socket for a full RTT (until its pending
    data are ACKed), which is worse.

This is why we generally try to fill packets over the wire as much as
possible. An alternative consists in opening many connections but this
is not efficient either then (RTTs, upstream packets).

So in practice I suspect that you already send requests with Nagle enabled
and disable it when you reach the end of the page, so that whatever can leave
is delayed at most 40ms and never more than the time to parse the whole page.
If this is the case, then you already have your requests delayed by as much
as 40ms and sent as groups.

> What is the network stack to do if, as you say, it should never delay
> anything? If I understand correctly, each "group" would always only
> include one object then.

I did not understand you meant delay between objects while parsing, I
thought you meant delay between groups.

Here you're limited by TCP. If you push too fast, you have to wait one RTT
between requests. If you ask the kernel to disable quick ACK or if you keep
NAGLE enabled (using TCP_CORK, MSG_MORE, etc...), your requests will
automatically leave between 40 and 200ms even if incomplete (far too much).

However, considering that only incomplete packets will remain pending
for the time it takes to parse the page and will leave anyway if it takes
longer than that, I think it remains optimal to feed the kernel's buffers
and let the first of the kernel or the HTML parser decide to send incomplete
segments. Otherwise you'd delay subsequent requests by an RTT in the TCP
stack.

> > In the example I proposed, the recipient receives the full headers
> > block, then from that point, all requests reuse the same headers
> > and can be processed immediately (just like pipelining in fact).
> >
> > Concerning response headers, I'd say that you emit a first response
> > group with the headers from the first response, followed by the
> > response. When another response comes in, you have two possibilities,
> > either it shares the same headers and you can add a response to the
> > existing group, or it does not and you open a new group.
> 
> Wait, is this the critical misunderstanding? Are you maintaining state
> across requests and responses? Isn't this a minor modification on the
> "simple" compressor? I was assuming you were trying to be stateless.

I'm having a hard time following you, I'm sorry. What state across requests
and responses do you mean ? The only "state" I'm talking about is the list
of common headers between the current message and the previous one in fact.
This is true both for requests and responses.

Regards,
Willy
Received on Wednesday, 23 January 2013 00:00:50 UTC