Location, location, location

I just wanted to jot down my impressions of the short discusssion we
had about the Location header and a few subsequent thoughts.  I just
wanted to put this down before I forget.  (There are other subjects I
also promised to write something about. One thing at a time).

The location header in 2xx responses identifies a URI at which the
client could request another copy of the resource being included as
the body of the response.  We discussed two issues in connection with
this: "invalidation" (i.e. "making stale") of other objects in a cache
due to the receipt of a response containing a Location header, and
spoofing.

I'm going to use the word "invalidation" to mean "making stale", since
it sounds better.

If a cache were to use the Location URI to cause the response entity
to be cached in such a way that a subsequent GET could retrieve the
entity from the cache using that URI, then a response containing a
Location URI not under the same authority as the request-URI could
accidentally or intentionally confuse a cache, and cause subsequent
requests to receive erroneous data in response.  If such replacement
is allowed, there seems to be no obvious way to keep this spoofing
from happening.  (Any ideas?)

Paul Leach suggested that it might be "necessary but not sufficient"
to at least require the Location URI to share the same hostname as
the request URI.  It is not sufficient because there may be multiple
authorities operating on the same host.  It is also not necessary
because a given authority may span multiple hosts, and a given host
may have multiple names.  A legitimate Location URI might not match
the request URI's hostname, whereas an illegimate one might sometimes
match the request hostname.  Paul seemed to have some other objections
related to worrying about this spoofing problem that I didn't understand.

Jeff Mogul suggested that invalidation of objects in the cache whose
request-URI matches the location URI is not sufficient to solve the
cache coherency problem.  This is of course true -- if there are
multiple caches containing copies of some object, invalidating objects
in one cache has no effect on other caches.  Therefore, Jeff suggested
that if it is critical for an up-to-date object to be served for every
request, that its expiration should be set so that it is validated on
each request.  I agree with that.

However, I believe there is a class of resources for which it is
desirable but not critical for an up-to-date version to be seen, and
for these objects it seems reasonable that a single cache should try
to do its best (forgive the anthropomorphizing) to provide the
up-to-date version.  Perhaps this can be viewed as an optional
optimization not required by the protocol, since it is always possible
that a series of requests may be served by different caches anyway,
and so the protocol doesn't guarantee that a cachable ("fresh") object
would ever be invalidated.

A context in which this invalidation could be very useful is content
negotiation.  If a Location header in a negotiated response ever
matches an explicit request URI for some object already in a cache,
but the negotiated response contains a different version that has come
into existence before the expiration date of the previously cached
version, a subsequent request on that URI in this cache will retrieve
the previous version.  This seems rather annoying, and can't be fixed
by replacement using the Location URI because of the spoofing problem
mentioned above.  The best that can be done is to invalidate other
objects in the cache that were keyed by a request-URI matching this
location-URI, so that future requests will require validation.  Since
this is useful for the content negotiation case, the same mechanism
may also be used to help with other uses of the Location header -- for
instance a response to a POST may be a new version of a GETtable
resource that may already have been cached.  Again, as Jeff points
out, if this is critical, that GETtable resource should be set to
require validation on each request, but it might be just desirable and
not critical.

Another minor point:  in the content negotiation example above, if the
explicit GET occurred after the negotiated GET, then invalidation
would still be desired, but this time the request-URI of the GET
(assuming no Location header was returned) would be used to invalidate
not only other occurrences of the request-URI in the cache, but also
entries for which that URI was returned as a Location URI.  So, for this
purpose the Location URI would serve as a secondary key to the cache.

--Shel

Received on Monday, 5 February 1996 00:29:32 UTC