RE: draft-ietf-httpbis-p6-cache-06 from Brian Smith on 2009-05-27 (ietf-http-wg@w3.org from April to June 2009)

From: Brian Smith <brian@briansmith.org>
Date: Tue, 26 May 2009 21:42:19 -0500
To: "'Adrien de Croy'" <adrien@qbik.com>, "'HTTP Working Group'" <ietf-http-wg@w3.org>
Message-ID: <000b01c9de74$c2534720$46f9d560$@org>
Adrien de Croy wrote:
> This leaves problems if there's no Vary header but content negotiation
> was used.  It's not possible to reliably heuristically determine if
> content-negotiation was used without the Vary header.  Vary is only a
> SHOULD level.  Maybe it should be a MUST level?

As you noted below, this rule applies:

   Caches MUST use the most recent response (as determined by the Date
   header) when more than one suitable response is stored.  They can
   also forward a request with "Cache-Control: max-age=0" or "Cache-
   Control: no-cache" to disambiguate which response to use.

Using this rule, along with other rules for caches, results in predictable
behavior, right?

> There needs to be another set of checks, at the minimum that
> Accept-Encoding matches the stored Content-Encoding.  E.g. you
> can't serve gzip content if there's no Accept-Encoding: gzip.
> Arguably others as well (like Accept-Language
> matching on Content-Language , with q values etc).

Actually, a server *is* allowed to do that. And, caches don't have to worry
about that because they don't have to interpret Accept-* and Content-*
headers. (See my previous message in the other thread.)

> It says if there are several stored representations serve the one with
> the most recent Date header (MUST level), but this may not be the
> appropriate one if Vary headers aren't available, and you are say
> selecting based on language.

The server needs to provide a Vary header in that case.

> S 2.5 Request methods that invalidate
> -------------------------------------
> I don't understand how a URI can be compared to a Content-Location or
> Location header and match yet the host part be different.  Surely to
> match the host part must be the same?  It's not clear to me what's
> being matched with what.

Example 1:
The Request-URI is  http://example.ORG/foo.
Content-Location in the response is http://example.COM/bar.
In this case, we shouldn't invalidate cached representations of
http://example.COM/foo because the request-URI's host doesn't match the host
of the Content-Location header.

Example 2:
The Request-URI is http://example.ORG/foo.
Content-Location is http://example.ORG/bar.
In this case, we should invalidate both http://example.ORG/foo and
http://example.ORG/bar.

Basically, this is trying to implement a "same origin" policy for cache
invalidation.

> Wrt POST (or any method).  If the response to a POST is marked
> explicitly by the origin server as cachable, why should a subsequent
> POST invalidate that contrary to other Cache-control directives?
> Surely this should only apply if the original method was not POST?

See the discussion about whether the method is part of the cache key. Caches
really need to be very conservative here (that is, MUST invalid) as there
seems to be a lot of disagreement amongst implementers and standardistas
regarding this issue.

> S 2.6 Caching Negotiated responses
> ----------------------------------
> Should I then be referring to Section 4.1 of [part3] to resolve the
> issues around content negotiation?  If so, maybe a mention in S 2.2
> would be useful.

No, see above.

> I also don't understand in para 5 the sentence "If the server responds
> with 304 (Not Modified) and includes an entity tag or Content-Location
> that indicates the entity to be used"  How can Content-Location be used
> to select an entity?  Do you match on previously returned
> Content-Location headers for requests for the same URI?  Is that what
> the final para is getting at?  Maybe the wording could be a bit clearer.

Content-Location is only used by caches for invalidation, and never for any
other reason (by caches). Basically, when choosing which cache entries to
invalidate, you must invalidate all the ones with the same Content-Location,
subject to the "same-origin" restriction explained above. 

> S 3.2 Cache-Control
> -------------------
> first sentence states that directives MUST be obeyed.   This doesn't
> fit with a strategy of ignoring unhandled directives if you get a
> mixture of request and response directives in a message (which is
> still allowed in the ABNF). I think it should therefore be
> explicit that it's not valid to mix the directives, else you get
> a MUST requirement to obey nonsensical directives.

A directive that looks like a cache-response-directive in a request is
actually a cache-extension, not a cache-response-directive. Similarly, a
directive that looks like a cache-request-directive in a response is
actually a cache-extension, not a cache-request-directive. The grammar is
just wrong.

> Otherwise relax the MUST, or relax it to the extent of nonsensical
> directives.  Also, you can't have a MUST requirement on an extensible
> mechanism.  Extensions need to be optional.

"Unrecognized cache directives MUST be ignored." But, caches must recognize
all directives defined in the HTTP spec.

> Some cache control directives are confusingly similar, especially for
> response directives.

I agree.

> 1. private directive with headers.
> 2. no-cache.

I will reply in a separate message.

> S 3.4 Pragma.
> -------------
> The BNF for this mentions extension-pragma.  Were there ever any of
> these?  Does it make sense to continue to support an extension
> mechanism on a deprecated header that no-one extended?

There are undoubtedly extension-pragmas being used which are not defined in
any standard.

> I've also seen some responses lately that have multiple Cache-Control
> headers - is this valid?

Yes. See the rules for repeated header field values in Part 1.

- Brian
Received on Wednesday, 27 May 2009 02:42:54 UTC