Re: Confusion over caching (was Re: Logic Bag concerns) from Shel Kaphan on 1995-12-11 (ietf-http-wg@w3.org from October to December 1995)

From: Shel Kaphan <sjk@amazon.com>
Date: Mon, 11 Dec 1995 13:13:53 -0800
To: "Roy T. Fielding" <fielding@avron.ICS.UCI.EDU>
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <199512112113.NAA05413@bert.amazon.com>
Roy T. Fielding writes:
 > >   A separate validator field would have
 >  >> to be generated by all servers for all cachable resources, consisting
 >  >> of an opaque value which is only usable for metadata comparison (i.e.,
 >  >> it does nothing to ensure that the entity received is the same as that
 >  >> sent by the origin server).  It requires that the server be capable and
 >  >> willing to generate this opaque validator even when the entity is
 >  >> not directly controlled by the server.
 >  >> 
 > > 
 > > I think it would be helpful if you would explain these claims rather
 > > than just claiming them.  Yes, the header would need to be present for
 > > any cachable resource (except for backwards compatibility with 1.0).
 > 
 > Which means that no 1.0 resource (or script designed for 1.0) can
 > generate something useful for cache validation.  Given the presence of
 > hierarchical caching, this is sufficient to reject the special-purpose
 > case as not fulfilling the requirements for HTTP/1.1.
 > 

Whether any new mechanism for validation is handled as a special case 
header or part of a general case expression is orthogonal to whether backward
compatibility has to be supported.

If more than one kind of header (e.g. content-MD5, content-length,
last-modified) is supported for validation, either for backward
compatibility or as a continuing design feature, it seems to me that
the set of possible logical expressions is highly constrained.  It
seems improbable (and incorrect) for clients to do anything except for
strict equality tests for MD5 or content-length, or GT tests for
modification-date, for instance.  And clients can determine the best
header to include in a request by virtue of which header was
previously received in association with the requested resource.

				   
 > > But why do you say it is only usable for metadata comparison?  If a
 > > part of a server is configured to use algorithm X to determine its own
 > > stated content-validator, then that part of the server must be able to
 > > respond to requests that use content-validators as generated by
 > > algorithm X, no?  And isn't it only the origin server that has to
 > > worry about generating these headers?  
 > 
 > No and no.  Only the recipient can test for message integrity of the
 > message received, and to do so they need to know the algorithm used
 > to generate the validator.

Aha!  Now we're exposing the difference of various people's models.
In Jeff's "opaque validator" model, the client never uses the
validator except as a token to pass back to the server. You're using
the term in a different way.  Of all the possible headers that the
client even *could* use to support your meaning of the term, only
Content-Length and Content-MD5 would be meaningful.  (or maybe the
others you mention below, which don't seem to be spec'ed yet).  Clearly your
usage is also useful (don't get me wrong), but it is a completely
different meaning of the term.  Whether we should allow "punning" --
using the same header for content-verification and for
cache-validation -- is a reasonable question to ask.  Clearly the
opaque validator model disallows that.

  If the validator is something useful, like
 > Content-MD5 or Content-SHA or Content-Checksum or even Content-Length,
 > then it can be used for both message integrity checks AND validation,
 > which means you don't duplicate information supplied for the special case.
 > 

Right -- using the opaque validator model, the server would have to
send a separate header if it wanted to provide for clients checksumming the
data received.  (But is that *bad*?)

 >  >> In contrast, IF does not make any assumptions or special requirements
 >  >> on the information being compared.  If an opaque value is available,
 >  >> then it can be compared.  If an MD5 is available, then it can be
 >  >> used as both an MD5 checksum and for cache validation.  If any
 >  >> useful metainformation (as judged by the client) is available, then
 >  >> it can be used within a comparison.
 >  >> 

This is orthogonal to whether an IF expression is used or
whether separate request headers are used.  In either case,
an *opaque* token can't be interpreted by the client.

I think it is an open issue whether, in HTTP 1.1, we should also allow for
using non-opaque headers for cache validation except for backward
compatibility.  I continue to have the feeling that that is what the
discussion about IF is actually about, and separating these issues is important.

 > > 
 > > The point of the opaque validator is to remove the smarts from the
 > > client side.
 > 
 > No.  The point is to provide reliable validation.

Either you already know what I meant, in which case it is pointless to argue,
or you don't, in which case it is also pointless to argue.

  There is no reason
 > why this cannot be done just as easily and just as reliably within an
 > extensible syntax,

I agree -- the syntax is not the most important issue.

  and with whatever validation-capable metainfo is
 > present in any given cachable entity,

If we're talking about pure 1.1 <=> 1.1 communication, I disagree.  We
have the opportunity to define it -- we don't have to support multiple
mechanisms with widely different (and antagonistic) underlying
philosophies, and if we do, then (I claim) it will be a strong sign of
design by committee.

 as it would be to do so for just
 > a special case.  Therefore, the special case loses.
 > 

I disagree.  Again, we're confusing discussion of validation
algorithms with discussion of syntax.  The fact that the IF syntax
allows for more general manipulations does not get down to the brass
tacks of the correctness of the basic validation algorithm.  If this
discussion is actually about a lack of consensus on what the basic
validation algorithm should be, let's work on those issues, and not
try to push it off to the indefinite future by providing a general
mechanism whose primary function seems to be to allow us to avoid
working through the hard issues of how caching should actually work.


 > > It really seems like there are multiple issues being
 > > discussed at once, which should be being discussed separately:
 > > 
 > > 1. What are the foreseeable "high-level" reasons for doing conditional
 > > requests, and how should those conditional requests be encoded in the
 > > protocol?  We have yet to see a plausible scenario that demonstrates
 > > this need.  Without stated requirements this seems like an exercise in
 > > futility.
 > 
 > I have already provided several.  As far as I am concerned, you must
 > prove that they are not plausible, since the solution provided does
 > satisfy the needs of opaque validation.  Your requirements are fulfilled
 > by a general syntax, my requirements are not fulfilled by a special-case
 > syntax, and therefore the only reasonable design is the general case.
 > 

I looked back at previous posts, and the closest thing I could find to
a requirement statement was that future extensions for preconditions
should not change the protocol.  I agree that this is a highly desirable
goal.  What I find questionable is whether this extension mechanism
needs to be the same mechanism as is used for cache validation.

 > > 2. Is there a requirement or benefit of having a general case solution
 > > to this that outweighs its complexity and the difficulty of specifying
 > > the semantics exactly?  General case solutions are nice, where there
 > > is a general case problem to be solved, but the added hair of having
 > > to put an expression parser in at this level seems quite questionable
 > > without a definite need.
 > 
 > I have already answered this question twice.  There is no semantic
 > ambiguity and no additional complexity if reasonable constraints are
 > placed on the set of required expressions.
 > 
See Koen's response to your "price" example.


 > > 3. Should we mix the high level mechanism with the low-level
 > > cache-integrity mechanisms?  What are the benefits/costs of that?
 > 
 > Irrelevant -- both represent the same semantics for interpreting
 > the request, and therefore are at the same level within HTTP.
 > 

Relevant -- I'd rather not have it be possible for goofy logical
expressions to mess up caches, while I'm perfectly happy for them to 
be used for arbitrarily complex conditional gets based on things
nobody has thought of yet.  As such, I'd rather view it as an optional
extension mechanism. 

 >  >> Most importantly, we don't have to specify the interaction between
 >  >> N types of preconditions if we only use one precondition field.
 > > 
 > > Doesn't backwards compatiblity already imply that this is required?
 > 
 > No, it doesn't -- allowing additional expressions does not change
 > the semantics of IF.

But IF already must interact with other headers, which are still in
the spec, my only point above.

  Using separate header fields for every precondition
 > does change the semantics of interpreting the request for each additional
 > field.  I KNOW THIS to be true because I've written and rewritten the HTTP
 > specification over 60 times now and can see this effect every time a new
 > request header field is added.
 > 

Yes, I cannot argue with that.  But I'm coming to my own conclusion
that IF, if it stays in the spec, should be optional, and a
Cache-Validator header should become the fundamental means of cache
validation.


 >  ...Roy T. Fielding
 >     Department of Information & Computer Science    (fielding@ics.uci.edu)
 >     University of California, Irvine, CA 92717-3425    fax:+1(714)824-4056
 >     http://www.ics.uci.edu/~fielding/


And now, back to the salt mines.

--Shel Kaphan
Received on Monday, 11 December 1995 13:20:32 UTC