- From: Evert Pot <me@evertpot.com>
- Date: Fri, 23 Nov 2018 19:23:10 -0500
- To: Andy Green <andy@warmcat.com>, ietf-http-wg@w3.org
> > I have always been a bit puzzled by how PUSH is supposed to be > beneficial when the server doesn't know what the client has locally > cached. Nowadays versioned scripts such as those from > ajax.googleapis.com are typically told to be cached locally for one year[1] I agree, but for the main case we're trying to solve (embedding vs pushing), both have this issue. This draft isn't intended to solve that problem, but the hope is that once Cache Digest[1] lands this _does_ become a viable optimization. (even more so if ETag makes its way back to key calculations). > > In the case "self" serves everything and all the assets have similar > caching policies, after the first visit any PUSH stuff riding on dynamic > HTML is going to be 99.99% wasted. > > The draft doesn't seem to address: > > - why would this be beneficial compared to just sending n pipelined > GETs on h2, if the client understands it wants n things already? Both > ways the return data has to be serialized into individual streams with > their own headers on a single h2 connection. With HPACK and n GETs that > differ only in the request URL, the header sets for each request are > cheap and you don't have to worry about either magicking up a new format > to carry the info or "market penetration" of implementation One major benefit is if a server knows in advance that a client will want certain resources, it can optimize for them. I hope my pseudo-code example illustrates this, but here's the general idea: function controller(request, response) { if (request.preferPush()) { response.push( allChildResources() ); } } Generally it's a lot cheaper to generate responses for a group of resources (based on for example a SELECT query), than it is to generate responses for each individually. Plus, for all these pushed responses a service might have done a bunch of work that doesn't need to be repeated for every response. Consider that the original request had authentication information, the server doesn't need to check Authorization headers for every request. > > The draft says with its method "it's possible for services to push > subordinate resources as soon as possible" but it doesn't compare it to > just doing n GETs from the start. I think you find any advantage is > hard to measure. But at least the draft should fairly compare itself to > the obvious existing way to do A client can only know which links are available and where they point to, after the initial response came back. After that, I agree, a client can just do GET requests for every linked resource individually and get the same performance (not considering the fact that servers can optimize for groups of similar requests). > - where does the contemporary knowledge come from at the client about > the relationships? From the server, ultimately? Then this is a bold > claim... The biggest use-case from my perspective is for hypermedia-style API's, such as HAL & Siren. In these cases clients generally do have knowledge of which links might potentially be available, but not where they will be pointing to. Solving this for HTTP-services that don't follow this paradigm is out of scope for this (for me at least). > >> It reduces the number of roundtrips. A client can make a single HTTP > request and get many responses. > > h2 pipelining doesn't work like h1 pipelining. You can spam the server > with requests on new streams and most (all?) servers will start to > process them in parallel while serving of earlier streams is ongoing. > The server cannot defer at least reading about the new stream starts on > the network connection because it must not delay hearing about tx credit > updates or it will deadlock. So there is a strong reason for servers to > not delay new stream processing. Yes, sorry, this is just to avoid having to wait for the first response (or subsequent responses in case a bigger part of the graph is requested), I don't expect it to optimize the case where a client already knows the target of the links. Anyway, point taken though. I think the draft needs to do a much better job addressing this. I also think we need to get more real-world data. Evert > > -Andy > > [1] "The CDN's files are served with CORS and Timing-Allow headers and > allowed to be cached for 1 year." > > https://developers.google.com/speed/libraries/ [1]: https://tools.ietf.org/html/draft-ietf-httpbis-cache-digest-05
Received on Saturday, 24 November 2018 00:23:36 UTC