Re: weak validators (resend) from Roy T. Fielding on 1996-04-12 (http-caching-historical@w3.org from April 1996)

From: Roy T. Fielding <fielding@avron.ICS.UCI.EDU>
Date: Fri, 12 Apr 1996 06:58:26 -0700
To: http-caching@pa.dec.com
Cc: Jeffrey Mogul <mogul@pa.dec.com>, jg@w3.org, Koen Holtman <koen@win.tue.nl>
Message-Id: <9604120658.aa04533@paris.ics.uci.edu>
> On Last-Modified: It seems we agreed that the use of last-modified for
> cache validation should be phased out.  Most people would not like
> proxies to rely on the Last-Modified value when combining ranges,
> because of the 1.0 servers around.  There did not seem to be much
> support for Roy's idea to require that 1.1 servers make the
> last-modified date a `strong' validator that is guaranteed to be
> different for different entities, even if the entity bound to the
> resource is updated twice in one second.

No, that isn't what we agreed to [it sure as hell isn't what I agreed to,
and there is no way I'd let that be said in the spec without a fight].
And the latter sentence is a mutation -- origin servers are already
capable of defining what it means to be "modified" and when Last-Modified
is changed.

If there is no other available information, Last-Modified is sufficient
and can be assumed to BE sufficient for all caching purposes.  If it
isn't sufficient, the provider of that information MUST supply something
more reliable than Last-Modified (which itself is reliable 99.9999% of
the time).  You can make it more reliable by not caching entities that
have a Date within one second of the Last-Modified, but that is an
optimization which should be placed in the caching heuristics section.

The only thing I agreed to was that weak validators may exist WITHIN
the new syntax of VID (I still hate CVal).  This must have no impact on
the interpretation of Last-Modified given the lack of any additional
information.

I also don't agree about including the syntax and then saying that
servers must not use it -- that is a waste of time.  If the syntax is
there, we must also provide sufficient description of how and why it
is there -- otherwise, implementors will use it for the wrong reasons.

The problem with the name is not with "weak" (they are indeed weak);
it would make a great deal more sense to people if we'd just stop
referring to them as "validators" (they aren't).  As I mentioned,
I refer to them as entity identifiers because that is what they do --
identify entities of the Request-URI.  Here is a more straightforward
syntax that avoids some of the pitfalls of basing a protocol element
on a conceptual understanding of validity.

    EID          = "EID" ":" entity-id
    If-EID       = "If-EID"     ":" ( "*" | 1#entity-id )
    Unless-EID   = "Unless-EID" ":" ( "*" | 1#entity-id )

    entity-id    = change-indicator [ ";" variant-id ]

    change-indicator = [ "W/" ] token
    variant-id       = token

The entire entity-id is case-sensitive.  I put the weakness indicator up
front because caches should not be required to look for it AND have to
extract it from the middle of the field.

I am not using double-quotes around the change-token (what was being
called the validator) because it is generally better to avoid quoted
values when they can be avoided (due to problems of charsets and the
possibility of embedded quotes and the problems of whitespace lossage
by gateways to non-HTTP environments).  In any case, I see no advantage
in giving people extra rope to hang themselves when the thing must be
a computed function anyway in order to be reliable.

Examples:

    EID: W/lkjsdrhfjh;5
    EID: afgef5647fed;iso-8859-7

    Unless-ID: W/lkjsdrhfjh;5, afgef5647fed;iso-8859-7

This syntax is combined with the following semantics:

   An entity-id SHOULD be supplied by the origin server for any entity
   which is cachable; however, its absence does not imply that the
   entity cannot be cached -- it only implies that the origin server is
   incapable or unwilling to provide this enhanced functionality.

   If the Request-URI corresponds to two or more variant representations,
   then a variant-id MUST be included in the entity-id to distinguish
   between those representations.  The combination of
        Request-URI + variant-id
   must uniquely identify each variant representation of that resource.

   A cache may use the variant-id to distinguish between cached variant
   representations of the Request-URI if the EID header field is present
   in the cached entities.  A cache may also use Content-Location for that
   purpose. [assuming we get it defined in time.]

   The change-indicator is used to indicate changes to the content of
   the resource uniquely identified by the Request-URI and variant-id.
   The change-indicator value SHOULD change when the content of an entity
   changes and SHOULD NOT change when the content remains the same.
   When the value changes, it MUST change to a value not already used for
   that entity within a timeframe for which there may still exist
   legitimately cached entities with the same change-indicator value.
   A change-indicator is called "strong" if the origin server guarantees
   that the value MUST change when the entity's content changes.  The
   origin server MUST prefix the change-indicator with "W/" if it is
   not generated by a strong function (i.e., is known to be "weak").

   Origin servers SHOULD use a strong generator function if any is
   available for that entity.

      Note: The "entity's content" refers to both the Entity-Body and all
      Entity-Header fields except Expires and Transfer-Encoding [the latter
      may be better described as a General-Header field anyway].

   Two entity-id's can be compared for equality by byte-comparison,
   excluding whitespace between the components.

   An entity-id may be used as a precondition for the partial GET method
   using the If-EID or Unless-EID header fields.  If the change-indicator
   is strong, the partial GET request may be completed by any cache with
   a cached entity having the same entity-id, unless a cache-control
   directive indicates otherwise.  If the change-indicator is weak, the
   partial GET request MUST NOT be completed by a public cache.

As far as I can tell, there is no reason that a private cache should
be prevented from using a weak validator comparison, even for byte range
insertion.
   
There is one addition to If-EID and Unless-EID to specifically handle
(without an interpretation hack) the cases of "any" and "none", which
allows the prevention of overwriting existing resources on a PUT.

    Unless-EID: *

means "unless any entity-id already exists for this Request-URI"
      (a.k.a., if no entity already exists), and

    If-EID: *

means "if any entity-id already exists for this Request-URI"
      (a.k.a., unless no entity already exists).


I think that represents enough compromise from me for one week and I am
not in the mood for playing any more name games.  If we can't settle
this within the next 24 hours then I think 1.1 should go forward without
any validators at all.


 ...Roy T. Fielding
    Department of Information & Computer Science    (fielding@ics.uci.edu)
    University of California, Irvine, CA 92717-3425    fax:+1(714)824-4056
    http://www.ics.uci.edu/~fielding/
Received on Friday, 12 April 1996 14:38:52 UTC