Re: Some thoughts on server push and client pull from Alek Storm on 2012-06-07 (ietf-http-wg@w3.org from April to June 2012)

From: Alek Storm <alek.storm@gmail.com>
Date: Thu, 7 Jun 2012 18:12:16 -0500
To: Jonathan Silvera <jsilvera@microsoft.com>
Cc: Gabriel Montenegro <Gabriel.Montenegro@microsoft.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, Matthew Cox <macox@microsoft.com>, Ivan Pashov <ivanpash@microsoft.com>, Osama Mazahir <OSAMAM@microsoft.com>, Rob Trace <Rob.Trace@microsoft.com>
Message-ID: <CAMNEcwutF-d45n3kW9S9m7iXaZGJC9YXLzO+ad4uuNLuLckMrQ@mail.gmail.com>
Hi Jonathan,

On Thu, Jun 7, 2012 at 4:45 PM, Jonathan Silvera <jsilvera@microsoft.com>wrote:

>  *I believe the client can achieve the same effect in SPDYv3 by sending a
> GOAWAY frame, indicating that the other endpoint must not open any new
> streams:*
>
> **·         **There is still data waste by the SYN stream sent to the
> client and also there is nothing on the SPDY spec preventing the server
> from sending data immediately after they send the SYN stream when doing
> Push. This would result in unnecessary SYN_STREAM in the best case and on
> unnecessary pushing of data in the worst case.
>
I believe you're conflating GOAWAY with RST_STREAM+CANCEL. The former
prevents the server from sending any SYN_STREAMs in the first place, while
the latter resets individual streams after their SYN_STREAM has arrived at
the client.

> ****
>
> **·         **From reading the SPDY spec, it seems GOAWAY affects the
> connection. However, there are still valid resources that the client would
> need from the server. Would controlling server push this way require the
> server to add additional logic to continue to serve additional requests on
> the connection, while adding state on the server side to not push
> additional content on subsequent requests when the client has sent a GOAWAY
> message? I am not clear on  how GOAWAY due to a client rejecting Push would
> differ from normal GOAWAY behavior on both client and server.
>
GOAWAY is asymmetric - only the receiving endpoint is prohibited from
opening new streams. In addition, one GOAWAY will suffice to prevent the
server from pushing streams for all future requests on the connection, not
just ones currently open. This state can be reset by opening a new
connection to the server.

> ****
>
> **·         **The SPDY spec, appropriately, does not make GOAWAY
> mandatory. This means that clients that do not implement GOAWAY could still
> have data pushed to them.
>
I'm a bit confused by this. Unless otherwise indicated, all parts of the
spdy/3 spec are mandatory. We have specifically provided a way for a client
to tell the server not to open any push streams. If the client only
partially (and therefore, incorrectly) implements the specification, what
are we to do about it?

> ****
>
> **·         **Overall  it seems that having the client control PUSH
> behavior is cleaner and is more efficient from a bandwidth perspective.
>
Agreed.

> ****
>
> ** *I proposed a detailed system of cache control for pushed resources on
> the spdy-dev mailing list, before I knew of the existence of this list. I
> plan to post it here soon:*
>

Since server push has been a popular topic in the past 24 hours, I
re-prioritized finishing the proposal and sent it to the list a few hours
ago. You can read it at
http://lists.w3.org/Archives/Public/ietf-http-wg/2012AprJun/0514.html,
although since some of the formatting has been mangled, I recommend an
email client for an optimal proposal-reading experience. Feedback is most
welcome, but should probably be posted in that thread.

> **
>
> **·         **Great to hear that you are thinking about this. We believe
> that is something that needs to be added to the Push spec. It would
> probably require the client to keep track of resources previously
> downloaded (no data on first navigation to a site) when visiting a top
> level page and advertising the URL for the cached resources, along with the
> corresponding cache validators as part of the top level request. The server
> would then have to explicitly confirm to the client, which cached resources
> are valid in addition to initiating SYN streams for the content that it is
> going to push.
>
There is no need to transmit the URI in addition to the cache validators,
as long as a new restriction is added to entity tags: they must be unique
among different URIs. I will add this to the proposal.

>  ****
>
> **·         **If you are thinking about something similar, it would be
> good to think about how to minimize data transferred on the wire for valid
> cached resources. In the above model, the client has to send the URL +
> cache validators and the server has to send back the URL to explicitly
> inform the client that the cached resource is valid. The net effect is that
> one additional header with a URL (per valid cache resources) is
> transferred, when compared to the proposed “Smart Client Pull” model.
>
My proposal only requires sending the cache validators for subresources,
not their URIs. Additional bandwidth savings may be accrued by setting the
"push-if-invalidated" cache control directive, if all subresources are
still fresh.

 *Pushed content is likely to be that which is required to be present at
> page load in the first place. Pushed CSS stylesheets would prevent flashes
> of unstyled content (FOUC), and pushed scripts that trigger on page load
> would be able to execute before the user is able to interact with the page:
> *
>
> **·         **Our concern here is that in order to completely prevent the
> race condition, the top level page data is serialized behind SYN stream for
> all pushed dependencies.
>
I'm afraid I don't understand this sentence. Can you rephrase?

>  ****
>
> **·         **As you mention the pushed dependencies will **likely** be
> only resources required to load the page. We take it a step further and
> recommend that the server only push critical resources that would block
> page load, since there are lower priority resources on webpages that do not
> affect the user experience. Unfortunately we cannot guarantee that will be
> case in practice.  The effect of misconfigured servers  that push content
> not used on the top level page or push extremely large number of resources,
> is that we end up blocking all data for the top level page on these SYN
> streams and without the first bits of data, the user will be staring at a
> blank screen.
>
I agree, and I think this recommendation should go into the spec. We can
only hope that server admins will err on the side of pushing too few
resources, rather than too many.

> ****
>
> ***Mozilla (and Chrome, I believe) already interpret rel=prefetch as
> indicating resources that are likely to be useful in *future* page loads,
> not necessarily the current one - I would recommend avoiding the naming
> conflict:*
>
> **·         **Our original spec was using the rel=subresource type,
> however we ended up changing it because rel=subresource is not currently
> included in the IANA (
> http://www.iana.org/assignments/link-relations/link-relations.xml) list
> of link relation assignments. We figured pursing an extension to an already
> defined link relation, to be faster than defining a new one. It would be
> great to hear what the veteran standard folks on this list think about this.
> ****
>
> **·         **We interpreted the difference between rel=subresource and
> rel=prefetch as mostly that of priority and saw nothing on the spec
> preventing clients from pre-fetching resources for the current  webpage. It
> would be great to understand if other browsers do not use rel=prefetch to
> bypass rendering/parsing delays. ** **
>
> **·         **We are not sure who has implemented rel=subresource, so it
> would be good to also hear thoughts of other browser vendors about removing
> the distinction between rel=subresource and rel=prefetch.
>
I think they should remain separate. Merging them means we're going to have
to pick one definition - if we pick subresource's, then servers will be
unable to notify the client of resources relevant to *future* requests,
since the client will immediately attempt to fetch them, wasting bandwidth.
Picking prefetch's current definition would present analogous problems.
You're right, the opinions of browser vendors would be helpful here.

Thanks for the detailed reply - this is a great discussion!

Alek

****
>
> Thanks
>
> ** **
>
> -Jonathan Silvera****
>
> ** **
>
> *From:* Alek Storm [mailto:alek.storm@gmail.com]
> *Sent:* Wednesday, June 6, 2012 7:29 PM
> *To:* Gabriel Montenegro
> *Cc:* ietf-http-wg@w3.org; Matthew Cox; Ivan Pashov; Osama Mazahir; Rob
> Trace; Jonathan Silvera
>
> *Subject:* Re: Some thoughts on server push and client pull****
>
> ** **
>
> On Wed, Jun 6, 2012 at 8:30 PM, Gabriel Montenegro <
> Gabriel.Montenegro@microsoft.com> wrote:****
>
>  <snip>****
>
> 2. Issues with current Server Push in SPDY****
>
>  ****
>
> We don't envision Server push as part of the base HTTP 2.0 protocol, but
> see it as a potentially interesting extension, as long as there is some way
> for the client to exert some control over when and how it is used. One
> fundamental requirement is for clients to be able to control “Server Push”
> behavior via a new opt-in <name TBD> header.  Servers MUST NOT push
> unrequested data to the client, unless the top level page request <name
> TBD> header is set to allow “Server Push”.****
>
>  ** **
>
> I have not verified this, but I believe the client can achieve the same
> effect in SPDYv3 by sending a GOAWAY frame, indicating that the other
> endpoint must not open any new streams.****
>
> ** **
>
>  Server Push does not require any validation prior to pushing data to the
> client, which could result in the server sending unnecessary data to
> clients that have some of the pushed resources stored in their cache.****
>
>  ** **
>
> I proposed a detailed system of cache control for pushed resources on the
> spdy-dev mailing list, before I knew of the existence of this list. I plan
> to post it here soon.****
>
> ** **
>
>  Furthermore, "Server Push" introduces a race condition in which a client
> could start a new request for data that the server is in the process of
> pushing, effectively causing the same resource to be downloaded twice. SPDY
> addresses the race condition by not sending any data (headers are OK) for
> the top level page, until all of the SYN_STREAM for the dependencies it
> will push are sent:****
>
>  ****
>
> "To minimize race conditions with the client, the SYN_STREAM for the
> pushed resources MUST be sent prior to sending any content which could
> allow the client to discover the pushed resource and request it."****
>
>  ****
>
> We agree that SPDY's proposal is a good way to mitigate the race condition
> in Server Push without introducing significant complexity. Unfortunately
> mitigating the race condition in this manner prevents the server from
> sending data for the top level page. This could result in user-visible
> delays.  Whether or not the user will see a delay will depend on what
> messages (how many and how large) the server is pushing to the client.****
>
>  ** **
>
> Pushed content is likely to be that which is required to be present at
> page load in the first place. Pushed CSS stylesheets would prevent flashes
> of unstyled content (FOUC), and pushed scripts that trigger on page load
> would be able to execute before the user is able to interact with the page.
> ****
>
> ** **
>
>  3. Smart Client Pull alternative to Server Push****
>
>  ****
>
> We would like to propose an alternative to Server Push for discussion.
> This alternative is closely aligned with existing standards and could even
> work for HTTP 1.1.****
>
>  ****
>
> When a server receives an HTTP request for a top level page, the server
> will generate a list of resources needed to fully load the top level page.
> The server will send the optimal pre-fetch list to the client, via LINK
> headers, with a "prefetch" link relation type (defined in HTML5 per
> http://www.iana.org/assignments/link-relations/link-relations.xml). ****
>
> The server SHOULD also include the corresponding cache validators for each
> resource in the pre-fetch list. An extension to the “prefetch” link
> relation type will be needed to allow cache validator data.****
>
>  ****
>
> When a client receives data for a top level page, it will begin processing
> the top level page response, while simultaneously pre-fetching resources in
> the pre-fetch list that are not in the client cache or that are cached but
> invalid, as indicated by the cache validators included in the pre-fetch
> list. Servers SHOULD only include resources that block loading of the top
> level page in the optimal pre-fetch list.****
>
>  ** **
>
> Mozilla (and Chrome, I believe) already interpret rel=prefetch as
> indicating resources that are likely to be useful in *future* page loads,
> not necessarily the current one - I would recommend avoiding the naming
> conflict. SPDY appears to have an ancillary proposal for a rel=subresource
> with essentially the semantics you describe, at
> http://dev.chromium.org/spdy/link-headers-and-server-hint/link-rel-subresource,
> which I believe is already implemented in Chrome.****
>
> ** **
>
> My soon-to-come cache control proposal will obviate the need for
> extensions to the Link header, and would afford more fine-grained control.
> ****
>
> ** **
>
> Alek****
>
Received on Thursday, 7 June 2012 23:12:52 UTC