Data Integrity from Shel Kaphan on 1996-02-15 (ietf-http-wg@w3.org from January to March 1996)

From: Shel Kaphan <sjk@amazon.com>
Date: Thu, 15 Feb 1996 14:13:08 -0800
To: hardie@nasa.gov
Cc: Larry Masinter <masinter@parc.xerox.com>, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <199602152213.OAA02111@bert.amazon.com>

Ted Hardie writes:
 > In the Working group issues list, one of the comments on data
 > integrity confused me a bit.  The second of the two positions asserts
 > that it is perfectly reasonable for a cache to do integrity checks, to
 > avoid the cache serving nonsense to the end user.  So far, I
 > understand and agree.  It goes on to say, however, that if the cache
 > serves nonsense to the end user, the end user may re-request the
 > document, thus correcting the problem in the cache.
 > 

These were my comments, so let me try to clarify.  If a user agent
receives bad data, i.e. a local integrity check fails, it can retry.
If it retries, it should do so in a manner that forces the request to
go all the way to the point of origin (by using one of the
cache-control directives), in order to make sure that the data wasn't
stored in a garbled form in a cache somewhere in between. If a proxy
cache inbetween the origin and the end user "sees" the ungarbled
version going by, it may store that in its cache.  If it does, that
will repair the garbledness.  If it doesn't, it won't.  There's
nothing especially magical about any of this.

 > I'm not clear from the text whether this is meant to apply to
 > the condition where integrity checks exist between proxy and server
 > or whether it is meant to apply to the condition where they do
 > not.  
 > 
It's meant to apply to the condition where they do not, i.e., when a
cache does not perform an integrity check, and ends up with bad data.

 > If the former, then the integrity check would presumably be applied
 > the first time the cache stored the item;

yes.

 if it passed somehow or was
 > corrupted later, re-requesting the item won't clear the problem unless
 > the proxy goes beyond current practice in re-checking the resource.

Yes. If the retry request contains cache-control: no-cache
(or, in Jeff's version, cache-control: reload) then the proxy will
have to forward the request.

 > Any time-based check, for example, won't fail until the resource has
 > expired or changed.  
 > 
yes.

 > A really aware client might still get the correct resource despite the
 > cache by reloading with a Pragma: request, but I can't see why the
 > cache would take that as a signal to update its copy (rather than
 > simply pipelining the request on to the origin server).  
 > 
Because it can.  The protocol doesn't force it to, but it would seem
to be reasonable practice.

 > Do we wish to suggest additions to current practice for the case where
 > integrity checks are applied between proxy and server?  If so, should
 > these include:  a method for the client to explicitly request a cache
 > update, and/or recommendations for additional checks by the proxy when
 > integrity checks are available? 
 > 
 > 					Ted Hardie
 > 					NASA Science Internet

I am not convinced anything extra really needs to be added.
The only thing that seems open to question is what a cache (should/must/may)
do with the response to a request with cache-control: no-cache (or reload).
My assumption was that  the most reasonable thing for the cache to do
would be to store the most recently received version of any document
it receives, especially since one of the reasons for a forced reload
would be to fix garbled data.  Maybe words to this effect belong in
the spec somewhere.

--Shel

Received on Thursday, 15 February 1996 14:19:49 UTC