Re: Multi-GET, extreme compression? from 陈智昌 on 2013-02-18 (ietf-http-wg@w3.org from January to March 2013)

From: 陈智昌 <willchan@chromium.org>
Date: Sun, 17 Feb 2013 19:00:20 -0800
To: Helge Hess <helge.hess@opengroupware.org>
Cc: James M Snell <jasnell@gmail.com>, Roberto Peon <grmocg@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>, Phillip Hallam-Baker <hallam@gmail.com>, Cyrus Daboo <cyrus@daboo.name>
Message-ID: <CAA4WUYh6hNKtiJHuuVxPzki+BQf=2YouAxuYc5Ea3tkmdhxDdg@mail.gmail.com>
Sorry Mark, one last email to highlight where I believe the confusion has
come from. I look forward to James' I-D.

On Sun, Feb 17, 2013 at 6:10 PM, Helge Hess <helge.hess@opengroupware.org>wrote:

> On Feb 17, 2013, at 5:52 PM, William Chan (陈智昌) <willchan@chromium.org>
> wrote:
> > I'm confused. We issue individual GETs for the individual resource URLs.
> How do we know to combine those individual resources into this magical
> /resource/set path?
>
> Well, I personally don't care too much about HTML here but about services,
> but I do think you can use the facility for this too. The browser would
> need to do some clever batching and latency management, but thats not
> really related to HTTP but HTML and an issue in any protocol.
> Fixing HTML would be a different thing, but sure, you could introduce
> resource-set tags which would directly map to batched requests.
>
> Presumably you receive your HTML in a streamed fashion as packets arrive,
> presumably parsing a packet is way faster than any network traffic. In fact
> many resources will be in the first few packets aka the head (scripts,
> CSS). For CSS its even more condensed within one resource, you probably get
> a few URLs within a very short time.
>
> > Furthermore, as I previously linked to in the very first reply to the
> thread, when we discussed MGET previously, I highlighted how the browser
> incrementally parses the document and sends GETs for resources as it
> discovers them.
>
> Yes, you might want to wait n (3?) milliseconds before sending out
> additional requests and batch what you get within that timeframe. You don't
> really send out requests in realtime while parsing, do you? ;-)
>

If you had read the previous email thread I linked to at the very
beginning, you would realize that contrary to Willy's expectation, I
demonstrated that we do indeed send out requests ASAP (putting aside some
very low-latency batching). We disable Nagle in order to prevent kernel
level delays in this manner, since we do indeed want to get requests out
ASAP.


>
> > > Also, how does this work for HTTP/1.X? Since we'll be living in a
> transitional world for awhile, I'd like to understand how this allows for
> HTTP/1.X semantics backwards compatibility.
> >
> > An old server would return a 405 when the BATCH comes in, then the
> client needs to switch to performing the operations individually.
> >
> > So, you handwaved over how the client would magically transform URL1 +
> URL2 + URL3 into magical example.com/resource/set. Assuming that's
> possible, how do you do the reverse transformation, when a HTTP/2=>HTTP/1.X
> gateway needs to translate HTTP/2 MGET requests for this /resource/set into
> the individual GETs for the original URLs.
>
> I can't follow you here. A BATCH of 5 GETs would exactly be the same like
> 5 individual GETs w/ less HTTP overhead and better compression. Its trivial
> to convert this in both directions.
>

I believe the confusion arises because you seem to think I was confused
about MGET coalescing X GETs. No, I totally understand how that would work.
 I have no questions there, and I already linked to the previous discussion
which you appear not to have read despite my repeated references. If you
look to where I replied directly to James's proposal, I expressed confusion
about these paragraphs:
"""
That is, assume that a server defines a set of N resources and assigns
that set a singular url that represents the entire collection. When I
do..

  MGET /resource/set HTTP/2.0

The server responds by opening N server-push response streams back to
the client, each associated with the original MGET. Each would have
it's own Content-Location and Cache-Control mechanisms allowing
intermediate caches to still do the right thing. The client does not
necessarily know what all it is getting from the server in advance but
knows it needs to be prepared to handle multiple items.
"""

I do not know what this magic /resource/set is, and I do not see how to
divide it back up into individual GETs. Are your comments really in
relation to James' proposal as stated? If so, your explanation of how we
combine <img>,<script>, etc into "/resource/set" is incomprehensible and I
look forward to seeing a Internet-Draft.

I see that James has hinted that indeed we would need new markup to
indicate a "resource set".


>
> > And even if this is possible, how reasonable is it to pay this roundtrip
> on receiving the 405? We've fought really hard to eliminate roundtrips.
>
> Maybe I'm missing something, but I thought the goal is to reduce 10...N
> requests to 1 in the best case. That 10 requests are 11 in the legacy case
> seems to be fine to me, plus a browser could remember on which sites it has
> seen a 405 and avoid the hit in the future.
>

The goal is to, amongst others, "Substantially and measurably improve
end-user perceived latency in most cases, over HTTP/1.1 using TCP." (as
taken from the httpbis charter for HTTP/2.0 work). Coming up with a more
concise request representation (by compressing or otherwise sharing
redundant information across requests) is merely a means to achieve
improved end-user perceived latency. If you introduce roundtrips, then we
have optimized for a means at the expense of the ultimate goal.

Also, your comment on simply remembering origin servers only works if there
is no interference from intermediaries. Since many implementors here are
exploring HTTP/2.0 over port 80 in the clear, it seems unreasonable early
on to rely on intermediaries all understanding HTTP/2.0, at least early on.


>
> hh
>
>
Received on Monday, 18 February 2013 03:00:50 UTC