Re: Submitted new I-D: Cache Digests for HTTP/2 from Richard Bradbury on 2016-02-02 (ietf-http-wg@w3.org from January to March 2016)

From: Richard Bradbury <richard.bradbury@rd.bbc.co.uk>
Date: Tue, 2 Feb 2016 18:09:32 +0000
To: ietf-http-wg@w3.org
Message-ID: <56B0F0DC.3060807@rd.bbc.co.uk>
On 27/01/2016 04:38, Kazuho Oku wrote:
> 2016-01-23 2:47 GMT+09:00 Richard Bradbury:
>> ... This would allow a server to push a more
>> up-to-date version of a representation in the case where that representation
>> has been updated before the originally stated expiry. This allows a server
>> to supply the freshest possible version, overriding the client's (in this
>> case mistaken) belief that its cached copy is still fresh.
>>
>> You suggest below that a client would ignore such a push because it still
>> believes its copy to be fresh, thereby defeating the server's attempt to
>> push a fresher version.
> Actually I had thought the same.
>
> However, my current understanding is that Firefox behaves like that
> (i.e. ignore the pushed resources if a fresh entry already existed in
> cache), and from what I heard such behavior conforms to the HTTP/2
> specification.

I just re-read [RFC 7540 Section 8.2] and couldn't find any text 
explaining how a client should deal with pushed responses that are 
fresher than a cached item that is still believed to be fresh. The main 
requirement is that the promised request is cacheable, as defined by 
[RFC 7321 Section 4.2.3], and the specification then goes on to say that 
pushed responses can be cached by the client if it implements a cache. 
And that's about it as far as I can work out.

The implementation in Firefox you describe sounds like a reasonable, 
simple strategy for dealing with HTTP/2 server push that suits a limited 
set of common web browsing Use Cases. Yes, the implementation conforms 
to the HTTP/2 specification, but only because the specification is 
silent on what to do in the more advanced scenario I have introduced.

Pragmatically speaking, if a user agent is not willing to accept a 
pushed representation that is fresher than something it (incorrectly) 
believes to be still fresh, then your proposed "if-modified-since" 
conditional request is the next best way of getting fresher items into 
the client's cache. Because it relies on the client taking the 
initiative (to find out what is stale and what is still fresh) it does 
feel suboptimal, which rankles a bit. But, having bounced the idea 
around with you, it could be that a client-initiated mechanism is the 
best that can be achieved short of first clarifying the rules for 
caching pushed responses in the HTTP/2 specification, and then 
persuading browser developers to implement those rules.


>> The next question concerns syntax. Elsewhere in the thread I think it has
>> been suggested that there could be two types of digest transmitted from
>> client to server: one ("fresh") generated from URLs the client believes to
>> be fresh, the other ("if-modified-since") based on URLs that it believes to
>> be stale. Reading between the lines, am I right in thinking that the latter
>> is intended to be a sort of conditional client request, with a "304 Not
>> modified" response being pushed in response for those representations that
>> turn out not to be stale?
> Yes.
>
>> This all seems quite complicated, and I find the combination of parameter
>> name and semantics a bit confusing. For simplicity's sake I might be
>> inclined to just include the versioning information as standard in all cache
>> digests, in spite of the resulting overhead. Then the server can decide what
>> needs to be pushed in response after comparing the versioning metadata in
>> the received digest with the (potentially more up-to-date) information
>> available to the server. And, as an added bonus, you then don't need to
>> worry about defining what a pushed 304 response means :-)
> For stale entries, I believe that we should always push a response
> (which would either be a full response or a 304).  Otherwise, the
> client will issue a conditional request, and until the response for
> the conditional request becomes available, it may not be able to
> render the webpage (if the requested resource was blocking the
> critical rendering path).
>
> For fresh entries, we do not need to push a response.

Yes. That makes sense to me.



> Considering the fact that downstream bandwidth in the first few
> packets is precious due to slow-start, I think sending separate
> digests for fresh and stale resources is a better approach, since with
> the knowledge the server can avoid sending 304 for fresh resources.

Conserving downstream bandwidth is a key goal of this I-D, so I think 
that's a valid point in relation to your proposed "if-modified-since" 
conditional request, even if the actual saving in this particular case 
is quite small.

Thanks!


>> On Tue, 12 Jan 2016 10:04:00 +0900 Kazuho Oku wrote:
>>
>> 2016-01-11 2:11 GMT+09:00 Alcides Viamontes E:
>>> ...
>>> Here are the issues that I see:
>>>
>>> 1.- In its current wording, no information about which version of a
>>> representation the browser already has is present in the cache digest.
>>> That information can be included in the URL itself (cache busting),
>>> but then it becomes a concern for web-developers, adds complexity to
>>> their work, and bypasses the mechanisms that HTTP has in place for
>>> maintaining cache state.  It also increases space pressure in the the
>>> browser's cache as the server is left with no means to expire old
>>> cached contents in the browser.
>> That is a very good point.
>>
>> Let me first discuss the restrictions of the cache model used by HTTP,
>> and then go on to discuss what we should do if we are to fix the point
>> you raised.
>>
>> First about the restriction; the resources in the cache can be divided
>> into two groups: fresh and non-fresh.  A server should never push a
>> resource that is considered as fresh in the client's cache.  Clients
>> will not notice the push / the HTTP/2 allows client to discard such
>> push.  Therefore, a CACHE_DIGEST frame
>> must include a filter that marks the resources that are marked as
>> being fresh.  That is what the current draft specifies.
>>
>>
>> I think this is a reference to the sentence at the start of Section 2.1
>> stating: "The set of URLs that is used to compute Digest-Value MUST only
>> include URLs that share origins [RFC6454] with the stream that CACHE_DIGEST
>> is sent on, and they MUST be fresh [RFC7234]." In other words, according to
>> draft 00, the client-generated digest only includes the URLs of cached
>> representations it considers to be fresh.
>>
>> Next about the point of including version information (e.g.
>> Last-Modified, ETag) in the cache digest.  I believe we can add a
>> second Golomb-coded set to the frame that uses hash(URI + version
>> information) as the key.  A server can refer to the information to
>> determine whether if it should push a 304 response or a 200 response.
>>
>> The downside is that the CACHE_DIGEST frame may become larger (if the
>> server sends many responses that would become non-fresh), so it might
>> be sensible to allow the client to decide if it should send the second
>> Golomb-coded set.
>>
>> In addition, we should agree on how to push 304 response.  My
>> understanding is that HTTP/2 spec., is vague on this, and that there
>> has not yet been an agreement between the client developers on how it
>> should be done.
>>
>> Once that is solved, I think we should update the I-D to cover the
>> version information as well.

-- 
Richard.
Received on Tuesday, 2 February 2016 18:09:58 UTC