Re: Some thoughts on server push and client pull from Rajeev Bector on 2012-06-08 (ietf-http-wg@w3.org from April to June 2012)

From: Rajeev Bector <rbector@yahoo-inc.com>
Date: Fri, 8 Jun 2012 15:33:31 -0500
To: Mike Belshe <mike@belshe.com>
CC: Alek Storm <alek.storm@gmail.com>, Jonathan Silvera <jsilvera@microsoft.com>, Gabriel Montenegro <Gabriel.Montenegro@microsoft.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, Matthew Cox <macox@microsoft.com>, Ivan Pashov <ivanpash@microsoft.com>, Osama Mazahir <OSAMAM@microsoft.com>, Rob Trace <Rob.Trace@microsoft.com>
Message-ID: <4FD2619B.2040802@yahoo-inc.com>
I am all for the overall concept that stuff that we know needs to be 
sent should be sent early. I am yet to get a warm-and-fuzzy about 
introducing a new paradigm altogether - to accomplish that. With proxies 
in the middle, it definitely adds more things to be thought about.  And 
as others on the thread have opined, getting all the other stuff  *done 
right* (and deployed at scale while we figure out how to inter-operate 
in a mixed world where SPDY and non-SPDY browsers will exist) will be an 
effort in itself.

On 6/7/12 8:46 PM, Mike Belshe wrote:
> I'm excited to hear from Google, Yahoo, Tornado, and Cotendo/Akamai 
> that there is research into server push.  I see no reason to cut it 
> when it is actively being worked on; my proposal to cut was based on 
> the belief nobody was trying.  I stand corrected!
>
> To those against the more speculative nature of server push features: 
>  We won't get another shot at protocol semantics for 15 years.  So, if 
> there is real work ongoing here, we should let it go further.
>
> To those that are doing research:  Please keep in mind that time is 
> short.  The rest of the protocol is ready to move forward now, and the 
> choices we make can be greatly changed based on whether or not push is 
> included as part of the core.
Precisely because time is short - it makes for an even compelling reason 
to decouple the two ! In the sense that since benefits from baseline 
SPDY proposals seem compelling enough - why delay deploying them while 
we wait for the research to be completed.

If the post-research feeling is that "server push" is the best way 
forward (instead of other flattening/pre-fetching alternatives), and the 
benefits seem compelling enough, having another rev in a cpl years 
should be possible (its again a latency vs throughput tradeoff :) ? Mnot ?


Thanks,
Rajeev

>
> To mnot:  Perhaps at the next meeting we can discuss some timeframes 
> around various concepts like server push so that those doing 
> implementations can have some guidance for dates.
>
> Mike
>
>
> On Thu, Jun 7, 2012 at 4:12 PM, Alek Storm <alek.storm@gmail.com 
> <mailto:alek.storm@gmail.com>> wrote:
>
>     Hi Jonathan,
>
>     On Thu, Jun 7, 2012 at 4:45 PM, Jonathan Silvera
>     <jsilvera@microsoft.com <mailto:jsilvera@microsoft.com>> wrote:
>
>         /I believe the client can achieve the same effect in SPDYv3 by
>         sending a GOAWAY frame, indicating that the other endpoint
>         must not open any new streams:/
>
>         ·There is still data waste by the SYN stream sent to the
>         client and also there is nothing on the SPDY spec preventing
>         the server from sending data immediately after they send the
>         SYN stream when doing Push. This would result in unnecessary
>         SYN_STREAM in the best case and on unnecessary pushing of data
>         in the worst case.
>
>     I believe you're conflating GOAWAY with RST_STREAM+CANCEL. The
>     former prevents the server from sending any SYN_STREAMs in the
>     first place, while the latter resets individual streams after
>     their SYN_STREAM has arrived at the client.
>
>         ·From reading the SPDY spec, it seems GOAWAY affects the
>         connection. However, there are still valid resources that the
>         client would need from the server. Would controlling server
>         push this way require the server to add additional logic to
>         continue to serve additional requests on the connection, while
>         adding state on the server side to not push additional content
>         on subsequent requests when the client has sent a GOAWAY
>         message? I am not clear on  how GOAWAY due to a client
>         rejecting Push would differ from normal GOAWAY behavior on
>         both client and server.
>
>     GOAWAY is asymmetric - only the receiving endpoint is prohibited
>     from opening new streams. In addition, one GOAWAY will suffice to
>     prevent the server from pushing streams for all future requests on
>     the connection, not just ones currently open. This state can be
>     reset by opening a new connection to the server.
>
>         ·The SPDY spec, appropriately, does not make GOAWAY mandatory.
>         This means that clients that do not implement GOAWAY could
>         still have data pushed to them.
>
>     I'm a bit confused by this. Unless otherwise indicated, all parts
>     of the spdy/3 spec are mandatory. We have specifically provided a
>     way for a client to tell the server not to open any push streams.
>     If the client only partially (and therefore, incorrectly)
>     implements the specification, what are we to do about it?
>
>         ·Overall  it seems that having the client control PUSH
>         behavior is cleaner and is more efficient from a bandwidth
>         perspective.
>
>     Agreed.
>
>         /I proposed a detailed system of cache control for pushed
>         resources on the spdy-dev mailing list, before I knew of the
>         existence of this list. I plan to post it here soon:/
>
>
>     Since server push has been a popular topic in the past 24 hours, I
>     re-prioritized finishing the proposal and sent it to the list a
>     few hours ago. You can read it at
>     http://lists.w3.org/Archives/Public/ietf-http-wg/2012AprJun/0514.html,
>     although since some of the formatting has been mangled, I
>     recommend an email client for an optimal proposal-reading
>     experience. Feedback is most welcome, but should probably be
>     posted in that thread.
>
>         ·Great to hear that you are thinking about this. We believe
>         that is something that needs to be added to the Push spec. It
>         would probably require the client to keep track of resources
>         previously downloaded (no data on first navigation to a site)
>         when visiting a top level page and advertising the URL for the
>         cached resources, along with the corresponding cache
>         validators as part of the top level request. The server would
>         then have to explicitly confirm to the client, which cached
>         resources are valid in addition to initiating SYN streams for
>         the content that it is going to push.
>
>     There is no need to transmit the URI in addition to the cache
>     validators, as long as a new restriction is added to entity tags:
>     they must be unique among different URIs. I will add this to the
>     proposal.
>
>         ·If you are thinking about something similar, it would be good
>         to think about how to minimize data transferred on the wire
>         for valid cached resources. In the above model, the client has
>         to send the URL + cache validators and the server has to send
>         back the URL to explicitly inform the client that the cached
>         resource is valid. The net effect is that one additional
>         header with a URL (per valid cache resources) is transferred,
>         when compared to the proposed “Smart Client Pull” model.
>
>     My proposal only requires sending the cache validators for
>     subresources, not their URIs. Additional bandwidth savings may be
>     accrued by setting the "push-if-invalidated" cache control
>     directive, if all subresources are still fresh.
>
>         /Pushed content is likely to be that which is required to be
>         present at page load in the first place. Pushed CSS
>         stylesheets would prevent flashes of unstyled content (FOUC),
>         and pushed scripts that trigger on page load would be able to
>         execute before the user is able to interact with the page:/
>
>         ·Our concern here is that in order to completely prevent the
>         race condition, the top level page data is serialized behind
>         SYN stream for all pushed dependencies.
>
>     I'm afraid I don't understand this sentence. Can you rephrase?
>
>         ·As you mention the pushed dependencies will **likely** be
>         only resources required to load the page. We take it a step
>         further and recommend that the server only push critical
>         resources that would block page load, since there are lower
>         priority resources on webpages that do not affect the user
>         experience. Unfortunately we cannot guarantee that will be
>         case in practice.  The effect of misconfigured servers  that
>         push content not used on the top level page or push extremely
>         large number of resources, is that we end up blocking all data
>         for the top level page on these SYN streams and without the
>         first bits of data, the user will be staring at a blank screen.
>
>     I agree, and I think this recommendation should go into the spec.
>     We can only hope that server admins will err on the side of
>     pushing too few resources, rather than too many.
>
>         /Mozilla (and Chrome, I believe) already interpret
>         rel=prefetch as indicating resources that are likely to be
>         useful in *future* page loads, not necessarily the current one
>         - I would recommend avoiding the naming conflict:/
>
>         ·Our original spec was using the rel=subresource type, however
>         we ended up changing it because rel=subresource is not
>         currently included in the IANA
>         (http://www.iana.org/assignments/link-relations/link-relations.xml)
>         list of link relation assignments. We figured pursing an
>         extension to an already defined link relation, to be faster
>         than defining a new one. It would be great to hear what the
>         veteran standard folks on this list think about this.
>
>         ·We interpreted the difference between rel=subresource and
>         rel=prefetch as mostly that of priority and saw nothing on the
>         spec preventing clients from pre-fetching resources for the
>         current  webpage. It would be great to understand if other
>         browsers do not use rel=prefetch to bypass rendering/parsing
>         delays.
>
>         ·We are not sure who has implemented rel=subresource, so it
>         would be good to also hear thoughts of other browser vendors
>         about removing the distinction between rel=subresource and
>         rel=prefetch.
>
>     I think they should remain separate. Merging them means we're
>     going to have to pick one definition - if we pick subresource's,
>     then servers will be unable to notify the client of resources
>     relevant to *future* requests, since the client will immediately
>     attempt to fetch them, wasting bandwidth. Picking prefetch's
>     current definition would present analogous problems. You're right,
>     the opinions of browser vendors would be helpful here.
>
>     Thanks for the detailed reply - this is a great discussion!
>
>     Alek
>
>         Thanks
>
>         -Jonathan Silvera
>
>         *From:*Alek Storm [mailto:alek.storm@gmail.com
>         <mailto:alek.storm@gmail.com>]
>         *Sent:* Wednesday, June 6, 2012 7:29 PM
>         *To:* Gabriel Montenegro
>         *Cc:* ietf-http-wg@w3.org <mailto:ietf-http-wg@w3.org>;
>         Matthew Cox; Ivan Pashov; Osama Mazahir; Rob Trace; Jonathan
>         Silvera
>
>
>         *Subject:* Re: Some thoughts on server push and client pull
>
>         On Wed, Jun 6, 2012 at 8:30 PM, Gabriel Montenegro
>         <Gabriel.Montenegro@microsoft.com
>         <mailto:Gabriel.Montenegro@microsoft.com>> wrote:
>
>             <snip>
>
>             2. Issues with current Server Push in SPDY
>
>             We don't envision Server push as part of the base HTTP 2.0
>             protocol, but see it as a potentially interesting
>             extension, as long as there is some way for the client to
>             exert some control over when and how it is used. One
>             fundamental requirement is for clients to be able to
>             control “Server Push” behavior via a new opt-in <name TBD>
>             header.  Servers MUST NOT push unrequested data to the
>             client, unless the top level page request <name TBD>
>             header is set to allow “Server Push”.
>
>         I have not verified this, but I believe the client can achieve
>         the same effect in SPDYv3 by sending a GOAWAY frame,
>         indicating that the other endpoint must not open any new streams.
>
>             Server Push does not require any validation prior to
>             pushing data to the client, which could result in the
>             server sending unnecessary data to clients that have some
>             of the pushed resources stored in their cache.
>
>         I proposed a detailed system of cache control for pushed
>         resources on the spdy-dev mailing list, before I knew of the
>         existence of this list. I plan to post it here soon.
>
>             Furthermore, "Server Push" introduces a race condition in
>             which a client could start a new request for data that the
>             server is in the process of pushing, effectively causing
>             the same resource to be downloaded twice. SPDY addresses
>             the race condition by not sending any data (headers are
>             OK) for the top level page, until all of the SYN_STREAM
>             for the dependencies it will push are sent:
>
>             "To minimize race conditions with the client, the
>             SYN_STREAM for the pushed resources MUST be sent prior to
>             sending any content which could allow the client to
>             discover the pushed resource and request it."
>
>             We agree that SPDY's proposal is a good way to mitigate
>             the race condition in Server Push without introducing
>             significant complexity. Unfortunately mitigating the race
>             condition in this manner prevents the server from sending
>             data for the top level page. This could result in
>             user-visible delays.  Whether or not the user will see a
>             delay will depend on what messages (how many and how
>             large) the server is pushing to the client.
>
>         Pushed content is likely to be that which is required to be
>         present at page load in the first place. Pushed CSS
>         stylesheets would prevent flashes of unstyled content (FOUC),
>         and pushed scripts that trigger on page load would be able to
>         execute before the user is able to interact with the page.
>
>             3. Smart Client Pull alternative to Server Push
>
>             We would like to propose an alternative to Server Push for
>             discussion. This alternative is closely aligned with
>             existing standards and could even work for HTTP 1.1.
>
>             When a server receives an HTTP request for a top level
>             page, the server will generate a list of resources needed
>             to fully load the top level page. The server will send the
>             optimal pre-fetch list to the client, via LINK headers,
>             with a "prefetch" link relation type (defined in HTML5 per
>             http://www.iana.org/assignments/link-relations/link-relations.xml).
>
>
>             The server SHOULD also include the corresponding cache
>             validators for each resource in the pre-fetch list. An
>             extension to the “prefetch” link relation type will be
>             needed to allow cache validator data.
>
>             When a client receives data for a top level page, it will
>             begin processing the top level page response, while
>             simultaneously pre-fetching resources in the pre-fetch
>             list that are not in the client cache or that are cached
>             but invalid, as indicated by the cache validators included
>             in the pre-fetch list. Servers SHOULD only include
>             resources that block loading of the top level page in the
>             optimal pre-fetch list.
>
>         Mozilla (and Chrome, I believe) already interpret rel=prefetch
>         as indicating resources that are likely to be useful in
>         *future* page loads, not necessarily the current one - I would
>         recommend avoiding the naming conflict. SPDY appears to have
>         an ancillary proposal for a rel=subresource with essentially
>         the semantics you describe, at
>         http://dev.chromium.org/spdy/link-headers-and-server-hint/link-rel-subresource,
>         which I believe is already implemented in Chrome.
>
>         My soon-to-come cache control proposal will obviate the need
>         for extensions to the Link header, and would afford more
>         fine-grained control.
>
>         Alek
>
>
>
Received on Friday, 8 June 2012 20:34:13 UTC