Re: Cache validators

This discussion started on the caching subgroup list.  However, I see no
point in continuing discussion in subgroups when we need to do the rest
of our work within the HTTP WG.  Besides, Jeff won't be able to respond
until the LA meeting and I hate writing the same message twice.

Jeff writes:
> When a cache (either in a user agent or in a proxy) has a cache entry
> and wants to find out if the origin server (or an inbound cache) still
> considers that entry to be a semantically appropriate response for a
> given request, it makes a conditional request for the resource.

Yes, but we have to keep in mind that cache's are not the only
applications that may make a conditional request.  That may seem
outside our caching group's scope, but it was not outside my
scope when I came up with both IMS and Unless (now IF).

> In HTTP/1.0, the only way to do a conditional request is to include
> an If-Modified-Since: header in the request.
> 
> For HTTP/1.1, I have proposed allowing the server to send an
> opaque validator header in a response, and requiring HTTP/1.1
> clients to provide this validator in any conditional request.

That makes sense for cache updates, but not for all use of conditionals.

> I have also proposed that these two validation methods (opaque
> validators, and If-Modified-Since:) are the only mechanisms for
> cache validation in conditional requests.
> 
> Roy has proposed a more elaborate mechanism, consisting of
> (as he puts it)
> 
>     IF, If-ID, Unless-Modified-Since, Cache-control: max-age, etc.

Nope.  The only thing I have proposed is IMS and Unless (now IF).
I consider max-age to be a cache directive, not a conditional request,
because it never changes the status of the response.  If-ID is a suggestion
(see below) for a conditional based on Content-ID.  The others are
proposals made by other people, whick I believe include three more
besides the ones listed above.

> In other words, Roy wants the client to be able to make the request
> conditional on a wide range of predicates.

Yep.

> There may be no actual conflict in our positions, if we can agree
> on some conceptual points.  In particular, I think we are again
> having problems with terminology, specifically "conditional request".
> 
> We really have two different kinds of conditionals to consider:
> 
> 	(1) cache-validation conditions: "does what I currently
> 	have in my cache match what you would currently send
> 	me in response to this request sufficiently well to
> 	maintain sematic transparency from your point of view?"
> 
> 	(2) utility conditions: "will what you send me be useful
> 	to me according to some criteria (because if it is not,
> 	I don't want you to send it)?"
> 
> An example of (2) might be "is what I have asked for more than
> 1000000 bytes, because if it is, I don't want to wait for it,
> so you shouldn't send it".  I think this is an orthogonal concept
> to cache validation, and we ought to be thinking about it separately.

I don't -- coming up with separate solutions for the same problem is
less efficient than a single solution for all such problems.

> I strongly believe that cache validation ought to be done with
> a single, *server-specified* condition.  This means opaque validators,
> period.  Since we need to interoperate with existing systems, we
> must also support If-modified-since.
> 
> Conservative cache validation is necessary to preserve semantic
> transparency.  That is, if we don't interoperate on this aspect
> of the protocol, then we lose semantic transparency.
> 
> I also believe that there is a good case to be made for supporting
> utility conditions, especially in view of the limited bandwidths
> and high latencies of many networks.  This is sort of like the
> (bogus) story of the $250 cookie recipe that keeps popping up:
> a user ought to be able to avoid being astonished by the cost of
> something she or he requests.  Roy's Unless: header seems like
> a good way to express these utility conditions, since they are
> client-centric, not server-centric.  I assume that when Roy
> mentions an "If:" header, he means that
> 	If: <predicate>
> is identical in meaning to
> 	Unless: {not <predicate>}
> and so that header can also express utility conditions.

Yep.

> Roy migh argue at this point that I have it all wrong.  This
> would reflect our basic disagreement about server-centric
> vs. client-centric control of cache transparency.  Or Roy
> might say, "aha, now we are getting somewhere" because our
> prior disagreement was based on a confusion between controlling
> cache validation and controlling orthogonal utility preferences.
> We'll see. :-)

None of the above.  I know that both are useful and I know that
both can be achieved with a single syntax and a single algorithm,
which means a simpler implementation and a better design.

> Roy also believes that Content-ID should be used as a cache
> validator.  At first I was mystified about this, because Content-ID
> doesn't appear anywhere in any HTTP specification draft, but
> I did find draft-levinson-cid-02.txt, "Content-ID and Message-ID
> Uniform Resource Locators", and I suppose this is what he must mean.
> If not, I would appreciate a clarification.

That is not what I mean.  First, I don't think that opaque validators
are necessary -- they may be useful, but not necessary.  However, I am
willing to give-in to that notion IF the opaque validator is
sufficiently useful to cover the cost of sending it.  That is, the opaque
validator must be generally interoperable with existing systems and
carry sufficient semantics for use for things other than cache updates.

In order to provide that additional usefulness, we need three things:

   1) A guarantee that the validator will change if the content changes
      and should not change if the content remains the same;
   2) A guarantee that the validator is byte-comparable (i.e., equal
      validators mean equal content);
   3) A guarantee that the validator is world-unique.

(1) is obvious.  (2) is necessaru for comparison without a request to
the origin.  (3) is necessary for it to be used as a cache key.

Not too surprisingly, this also happens to be the definition of Content-ID
in MIME.  Therefore, for maximum interoperabilty with existing systems,
we should use Content-ID if we are to have an opaque validator.

> If Roy means this Content-ID proposal, which defines
> 	cidurl     = "cid" ":" addr-spec
>    where "addr-spec" is defined in [RFC822].
> I.e.,
> 	addr-spec   =  local-part "@" domain
> 
> If so, I fail to see how this stuff has any value at all as
> a cache validator.
> 
> "Cache-control: max-age" (in a request) as well as other cache-control
> directives that I have proposed (fresh-min, stale-max) are not
> exactly cache validation conditions.  Rather, they effectively
> allow the user to specify whether or not a cache entry should be
> validated.  So these are not really conditions on the request
> between the client and the cache; rather, they are instructions
> from the client to the cache saying when conditional requests
> should be done from the cache to some inbound cache or origin
> server.

Ummm, the way you phrased that makes them the same thing.  What makes
them different is not the fact that they are both parameters to the
request; they are different because a conditional may result in a
different response status based on the condition.

> How do the various conditions fit together?  To me, it makes
> sense to treat them conjunctively.  That is, if a request
> includes both a cache validator (using either If-Modified-Since:
> or If-Invalid:) and a utility condition (such as
> "Unless: {gt {Content-Length 10000}}"), then the server should
> apply both conditions with an implied "AND".
> 
> For example, this request:
> 	GET /home.html HTTP/1.1
> 	If-Invalid: xyzzy
> 	Unless: {gt {Content-Length 10000}}
> means
> 	if (current-validator == xyzzy)
> 	    return "304 Not Modified"
> 	else if (content-length > 10000)
> 	    return "412 Unless true"
> 	else
> 	    return "200 OK" + entire body

Whoa! You just added an implied ordering as well.

> It makes sense to evaluate the cache-validation condition
> before any utility conditions, because why bother evaluating
> the utility conditions if the client already has the response
> in its cache?

But that is assuming the utility condition is being made for the
purpose of a cache update.  You can't make that assumption.
Even if you require that assumption, such a decision makes the protocol
less extensible because you have created an invisible ordering
based only upon what you think is important for your particular
application today.

More importantly, I would never want both If-Invalid and Unless (IF).
The reason for a single extensible syntax for all preconditions is
to avoid such confusion and mistakes in applying ordering among
the conditions, particularly when more conditions are added later.

It is my understanding that many people have complained that the
current Unless syntax and semantics will be too much of a burden
to implement.  I would have a lot more faith in their opinion if they
were to actually attempt to implement it FIRST, but I can't force
people to do that.  However, I won't allow six additional precondition
syntaxes to be added to the protocol willy-nilly -- people will have to
prove that they are more efficient in total than a single extensible
syntax.  My knowledge of HTTP applications allows me enough foresight
to know that a single syntax is more efficient than any two additional
preconditions, so proving it against six will be trivial.

On the other hand, I would also like to see HTTP/1.1 complete sometime
this century.

So, if people would like a simple precondition syntax that is useful
for all of the currently identified protocol needs, including cache
validation, byte ranges, and content negotiation, then I have the following
suggestion:

   1) Require Content-ID in HTTP/1.1 responses

     Content-ID  =  "Content-ID" ":" cid
            cid  =  <a content-id as defined in RFC 1521>

   2) Implement the following precondition syntax:

     If-ID  =  "If-ID" ":" 1#cid

      wherein the condition evaluates to true if the response to the
      request would have had a Content-ID equal to one of the ones
      given in the If-ID header field value.  Like the current definition
      of Unless in draft 01, the response to a "false" evaluation
      depends on whether or not Range or IMS is also present.

That should make a sufficient number of people happy to make the
overhead of doing it worthwhile.  If not, then the only reasonable
solution is to use an IF header field with a generic syntax.


 ...Roy T. Fielding
    Department of Information & Computer Science    (fielding@ics.uci.edu)
    University of California, Irvine, CA 92717-3425    fax:+1(714)824-4056
    http://www.ics.uci.edu/~fielding/

Received on Thursday, 22 February 1996 01:26:58 UTC