Re: weak validators (resend) from Koen Holtman on 1996-04-12 (http-caching-historical@w3.org from April 1996)

From: Koen Holtman <koen@win.tue.nl>
Date: Sat, 13 Apr 1996 00:23:10 +0200 (MET DST)
To: fielding@avron.ICS.UCI.EDU (Roy T. Fielding)
Cc: http-caching@pa.dec.com, mogul@pa.dec.com, jg@w3.org, koen@win.tue.nl
Message-Id: <199604122223.AAA00657@wsooti04.win.tue.nl>
Roy T. Fielding:
>[Koen Holtman:]
>> On Last-Modified: It seems we agreed that the use of last-modified for
>> cache validation should be phased out.  
[...]
>No, that isn't what we agreed to

I guess I misinterpreted some part of the discussion then.  As Jim Gettys
also remembers we did not agree on this, I guess you are right and I was
wrong.

[...]
>If there is no other available information, Last-Modified is sufficient
>and can be assumed to BE sufficient for all caching purposes.  If it
>isn't sufficient, the provider of that information MUST supply something
>more reliable than Last-Modified (which itself is reliable 99.9999% of
>the time).

My point against allowing Last-Modified values (generated by 1.0 servers)
to combine ranges is not so much that it is unreliable in 0.0001% of the
cases, but that there is nothing in the 1.0 spec that _disallows_ a 1.0
server to provide resources for which combining ranges using Last-Modified
is 99.9% unreliable.  You can require the above MUST for 1.1 servers, but
not for 1.0 servers.

A 1.0 server can quite legitimately serve, for each subsequent request on a
resource, a HTML document randomly picked from a pool of 1000 HTML
documents which all have the same semantic content without being byte
equal, and such a server can legitimately tag all these 1000 HTML documents
with the same Last-Modified date.  This means that we cannot logically
claim 1.1 to be downwardly compatible with 1.0 servers if we allow 1.1
clients to combine ranges with the same Last-Modified header.  The
incompatibility would maybe not be a practical problem, but it would be
there, and its mere theoretical existence will contradict any claims made
in the 1.1 document about compatibility between HTTP versions with the same
major version number.

[...]

>Here is a more straightforward
>syntax that avoids some of the pitfalls of basing a protocol element
>on a conceptual understanding of validity.

I agree to this syntax, though there are a few things I strongly disagree
with below.

>
>    EID          = "EID" ":" entity-id
>    If-EID       = "If-EID"     ":" ( "*" | 1#entity-id )
>    Unless-EID   = "Unless-EID" ":" ( "*" | 1#entity-id )
>
>    entity-id    = change-indicator [ ";" variant-id ]
>
>    change-indicator = [ "W/" ] token
>    variant-id       = token
>
>The entire entity-id is case-sensitive.  I put the weakness indicator up
>front because caches should not be required to look for it AND have to
>extract it from the middle of the field.
>
>I am not using double-quotes around the change-token (what was being
>called the validator) because it is generally better to avoid quoted
>values when they can be avoided (due to problems of charsets and the
>possibility of embedded quotes and the problems of whitespace lossage
>by gateways to non-HTTP environments).  In any case, I see no advantage
>in giving people extra rope to hang themselves when the thing must be
>a computed function anyway in order to be reliable.

Side remark: I think the chance of CGI authors hanging themselves writing
functions that make tokens are is greater than the chance of CGI authors
hanging themselves writing functions that make quoted strings.  When making
tokens, you have to know all about which characters cannot be in tokens
because they are in `tspecials'.  I certainly don't know all tspecials by
heart.  How many CGI authors will not bother to look it up?

>
>Examples:
>
>    EID: W/lkjsdrhfjh;5
>    EID: afgef5647fed;iso-8859-7
>
>    Unless-ID: W/lkjsdrhfjh;5, afgef5647fed;iso-8859-7
>
>This syntax is combined with the following semantics:
>
>   An entity-id SHOULD be supplied by the origin server for any entity
>   which is cachable; however, its absence does not imply that the
>   entity cannot be cached -- it only implies that the origin server is
>   incapable or unwilling to provide this enhanced functionality.
>
>   If the Request-URI corresponds to two or more variant representations,
>   then a variant-id MUST be included in the entity-id to distinguish
>   between those representations.

I disagree to this MUST: the alternates in transparent content negotiation
do not have variant-ids in draft-holtman, they have alternate URIs.  The
above MUST does not allow transparent content negotiation according to
draft-holtman on top of 1.1, and we have consensus that this must be
allowed. The above text should read:

   If the Request-URI identifies a varying resource *which uses the Vary
   header* to indicate variance, then a variant-id MUST be included in the
   entity-id to distinguish between different variant entities. 

The 1.1 spec can, but does not need to, add

   If the Request-URI identifies a varying resource *which uses the
   Alternates header* to indicate variance, then a variant-id should not be
   included, but the change-indicators by themselves SHOULD be different
   for all different variant entities.

I strongly object to a requirement that varying resources which use the
Alternates header to indicate variance MUST include variant-ids.  Such a
requirement would be easy to satisfy for preemptive negotiation, but very
painful for reactive negotiation, where a second request is done on the
actual URI of the alternate resource (which may even live on a different
server).  If the second response has to include a variant-id, then the
alternate resource must `know' that it is in fact being used as an
alternate resource by some transparently negotiated resource.  This
requirement to know would cause immense, and completely unnecessary,
logistics problems for the authors of transparently negotiated resources.

>   The combination of
>        Request-URI + variant-id
>   must uniquely identify each variant representation of that resource.

No, that is

   The combination of
         Request-URI + variant-id (if present) + 
                   Content-Location header (if present)
   must uniquely identify each variant representation of that resource.

>   A cache may use the variant-id to distinguish between cached variant
>   representations of the Request-URI if the EID header field is present
>   in the cached entities.  A cache may also use Content-Location for that
>   purpose. [assuming we get it defined in time.]

We need to define Content-Location in time, because my text about cache
replacement for varying resources uses it.

>   The change-indicator is used to indicate changes to the content of
>   the resource uniquely identified by the Request-URI and variant-id.
>   The change-indicator value SHOULD change when the content of an entity
>   changes and SHOULD NOT change when the content remains the same.
>   When the value changes, it MUST change to a value not already used for
>   that entity within a timeframe for which there may still exist
>   legitimately cached entities with the same change-indicator value.

As Jeff has pointed out in his more theoretical caching models, one can
legitimately store in cache memory a stale response forever.  So your
requirement above is better expressed as:

    When the value changes, it MUST change to a value not already used.

>   A change-indicator is called "strong" if the origin server guarantees
>   that the value MUST change when the entity's content changes.  The
>   origin server MUST prefix the change-indicator with "W/" if it is
>   not generated by a strong function (i.e., is known to be "weak").
>
>   Origin servers SHOULD use a strong generator function if any is
>   available for that entity.
>
>      Note: The "entity's content" refers to both the Entity-Body and all
>      Entity-Header fields except Expires and Transfer-Encoding [the latter
>      may be better described as a General-Header field anyway].
>
>   Two entity-id's can be compared for equality by byte-comparison,
>   excluding whitespace between the components.
>
>   An entity-id may be used as a precondition for the partial GET method
>   using the If-EID or Unless-EID header fields.  If the change-indicator
>   is strong, the partial GET request may be completed by any cache with
>   a cached entity having the same entity-id, unless a cache-control
>   directive indicates otherwise.  If the change-indicator is weak, the
>   partial GET request MUST NOT be completed by a public cache.

We can actually require something weaker here.  We need only require that:

 If two or more partial responses are be merged by a client (proxy or user
 agent or user agent helper application) into a complete response, or
 bigger partial response, these partial responses MUST both have the same
 strong change-indicator.

This would make it OK for partial GETs to be completed by caches if the 
change-indicator is weak, which is I believe something you wanted.

>As far as I can tell, there is no reason that a private cache should
>be prevented from using a weak validator comparison, even for byte range
>insertion.

As far as I can tell, the privacy requirements of a response are orthogonal
to the server's ability to supply a strong validator.  So a private cache
should also be prevented from using a weak validator comparison for
combining ranges.

>There is one addition to If-EID and Unless-EID to specifically handle
>(without an interpretation hack) the cases of "any" and "none", which
>allows the prevention of overwriting existing resources on a PUT.
>
>    Unless-EID: *
>
>means "unless any entity-id already exists for this Request-URI"
>      (a.k.a., if no entity already exists), and
>
>    If-EID: *
>
>means "if any entity-id already exists for this Request-URI"
>      (a.k.a., unless no entity already exists).
>

This addition would be OK with me.

>I think that represents enough compromise from me for one week and I am
>not in the mood for playing any more name games.

Note that I carefully avoided playing name games above :)

>  If we can't settle
>this within the next 24 hours then I think 1.1 should go forward without
>any validators at all.

I do not agree to removing validators if we can't settle all issues
connected to them in 24 hours.  Removing validators would make accessing
varying resources way to expensive.  We have consensus that the 1.1 Vary
header should be good enough to support multi-lingual servers.  Removing
If-Invalid would make use of Vary so expensive that it is hardly usable.

> ...Roy T. Fielding

Koen.
Received on Friday, 12 April 1996 22:54:23 UTC