Re: Status report: caching stuff

Jeff,

I finally finished reading an commenting your cache text.  I hope you
can do something with my comments.

I don't know exactly which parts you plan to put in the 1.1 text (I
assume you do not want to put all if it in there!), so I wrote
comments on the whole text.

I deleted sections I did not comment on, but left plenty of context
otherwise, so this message ended up a bit long.

Koen.

>
>Caching subgroup of HTTP working group              Jeffrey Mogul/DECWRL
>Internet-Draft                                              3 April 1996
>Expires: 1 October 1996

>1 Caching in HTTP

>1.1 Semantic transparency

Reading your informal definition of the term, I see that we have been
using the term `semantically transparent' in a number of different
ways:

 1) transparent means: you get exactly what you would have gotten from
    the origin server

 2) transparent means: you only get responses that are fresh according
    to the Expires: and Cache-control: max-age instructions of the
    origin server.

 3) transparent means: the cache follows all the rules in the 1.1
    spec.

I don't know if you can fix this problem, or if it needs fixing in
order to write good text for the 1.1 draft.  Maybe the best way to
cope with the three different meanings is to _not_ use the word in the
text for the 1.1 spec.  I just want you to be aware of the problem
here.  Maybe we need a new term, `opaque cache' or something, to
denote 3) above.


>   Ideally, an HTTP/1.1 cache would be ``semantically transparent.''
>   That is, use of the cache would not affect either the clients or the
>   servers in any way except to improve performance.  When a client
>   makes a request via a semantically transparent cache, it receives
>   exactly the same entity headers and entity body it would have
>   received if it had made the same request to the origin server, at the
>   same time.
>
>   In the real world, requirements for performance, availability, and
>   disconnected operation require us to relax the goal of semantic
>   transparency in many cases.  The HTTP/1.1 protocol allows origin
>   servers, caches, and clients to explicitly reduce transparency when
>   necessary.  However, because non-transparent operation may confuse
>   non-expert users, and may be incompatible with certain server
>   applications (such as those for ordering merchandise), the protocol
>   requires that transparency may not be relaxed
>
>      - without an explicit protocol-level request (when done by
>        client or origin server)
>
>      - without a means for warning the end user (when done by
>        cache or client)

Add:

       - without a means for warning the origin server (when done by
         cache or client)

>
>   Therefore, the HTTP/1.1 protocol provides these important elements:
>
>      1. Protocol features that provide full semantic transparency
>         when this is desired by all parties.
>
>      2. Protocol features that allow an origin server or end-user
>         client to explicitly request and control non-transparent
>         operation.
>
>      3. Protocol features that allow a cache to attach warnings to
>         responses that do not preserve semantic transparency.

Add:

       4. Protocol features that allow a cache to attach warnings to
          requests if the responses will not be cached in a
          semantically transparent way.

>
>   A basic principle is that it must be possible for the clients to
>   detect any potential breakdown of semantic transparency.

Change to:

   A basic principle is that it must be possible for the user agents
   and origin servers detect any potential breakdown of semantic
   transparency.

>
>
>DO NOT IMPLEMENT TO THIS DOCUMENT!                              [Page 4]
>
>Internet-Draft          HTTP 1.1 caching (DRAFT)      3 April 1996 16:41
>
>
>   Caching would be useless if it did not significantly improve
>   performance in many cases.  The goal of caching in HTTP/1.1 is to
>   eliminate the need to send requests in many cases, and to eliminate
>   the need to send full responses in many other cases.  The former
>   reduces the number of network round-trips required for many
>   operations; we use an ``expiration'' mechanism for this purpose (see
>   section 1.2).  The latter reduces network bandwidth requirements; we
>   use a ``validation'' mechanism for this purpose (see section 1.3).
>
>   The server, cache, or client implementor may be faced with design
>   decisions not explicitly discussed in this specification.  If
>   decision may affect semantic transparency, the implementor ought to
>   err on the side of maintaining transparency unless a careful and
>   complete analysis shows significant benefits in doing otherwise.

Add:

    If transparency is not maintained by the cache, it should always
    add the appropriate request and response headers to signal this
    breakdown in transparency to the end points in the communication
    chain.

>
>   A note on terminology: we say that a resource is ``cachable'' if a
>   cache is allowed to store a copy of this resource, when it arrives in
>   a response message, and then later use that copy to respond to a
>   subsequent request.  Even if a resource is cachable, there may be
>   additional constraints on when and whether a cache can use a cached
>   copy of it.

This is a very good terminology explanation! I suggest you try to
re-work it to that it can become part of the terminology section of
the main 1.1 document.

>
>1.2 Expiration model

[...]


>1.2.1 Server-specified expiration
>   HTTP caching works best when caches can entirely avoid making
>   requests to the origin server.  The primary mechanism for avoiding
>   requests is for an origin server to provide an explicit expiration
>   time in the future, indicating that a response may be used to satisfy
>   subsequent requests.  In other words, a cache can return a fresh
>   response without first contacting the server.
>
>   Our expectation is that servers will assign future explicit
>   expiration times to responses in the belief that the entity is not
>   likely to change, in a semantically significant way, before the
>   expiration time is reached.  This normally preserves semantic
>   transparency, as long as the server's expiration times are carefully
>   chosen.
>
>   If an origin server wishes to force a cache to validate every
>   request, it may assign an explicit expiration time in the past.  This
>   means that the response is always stale, and so the cache SHOULD
>   validate it before using it for subsequent requests.  

Gack!  Please change this to

    This
    means that the response is always stale, and so the cache MUST
    validate it before using it for subsequent requests, unless it is
    configured to act in-transparently (opaquely?).

>   (Note that a
>   firsthand response SHOULD always be returned to the requesting
>   client, independent of its expiration time.)

Why the should?  I'd say make this a MUST.  Even for in-transparent
caches.

>   Servers specify explicit expiration times using either the Expires:
>   header, or the max-age directive of the Cache-control: header.
>
>1.2.2 Limitations on the effect of expiration times

>1.2.3 Heuristic expiration
>
>1.2.4 Client-controlled behavior

>1.2.5 Exceptions to the rules and warnings
>   In some cases, the operator of a cache may choose to configure it to
>   return stale responses even when not requested by clients.  This
>   should not be done lightly, but may be necessary for reasons of
>   availability or performance, especially when the cache is poorly
>   connected to the origin server.  Whenever a cache returns a stale
>   response, it must mark it as such (using a Warning: header).  This
>   allows the client software to alert the user that there may be a
>   potential problem.

Maybe you should add something here about the need to disobey the
Expires: <yesterday> instructions from (advertising) sites which add
them for frivolous reasons, like getting higher hit counts.  This is
the real reason cache operators would want to configure
in-transparently for some servers.

>   It also allows the user to take steps to obtain a firsthand or fresh
>   response, if the user so desires.  For this reason, a cache MUST NOT
>   return a stale response if the client explicitly requests a
>   first-hand or fresh one, unless it is impossible to comply.
>


>1.2.6 Age calculations

[...]

>      resident_time = now - response_time;
>      current_age = corrected_initial_age + resident_time;
>
>   When a cache sends a response, it must add to the
                                      ^^^^
>   corrected_initial_age the amount of time that the response was
>   resident locally.  It must then transmit this total age, using the
>   Age: header, to the next recipient cache.

I don't agree to this MUST for a pure 1.1 chain: we have been through
this before: it is unnecessary, and even harmful, to require that the
apparent_age correction is always applied.  I don't really want to
discuss this again now, I just want to announce that I will make noise
if this MUST ends up in the 1.1 document.


>1.2.7 Expiration calculations

[...]

>   If neither Expires: nor Cache-control: max-age appears in the
>   response, and the response does not include other restrictions on
>   caching, the cache MAY compute a freshness lifetime using a
>   heuristic.  This heuristic is subject to certain limitations; the
>   minimum value may be zero, and the maximum value MUST be no more than
>   24 hours.

I would agree to 24 hours, but others may not.  A rule like

 max(24 hours, last-modified(if present) - date / 10)

may meet with less opposition.

>   TBS: other recommendations re: heuristics?
>
>   The calculation to determine if a response has expired is quite
>   simple:
>
>      response_is_fresh = (freshness_lifetime > current_age)
>
>1.3 Validation model

>1.3.1 Last-modified dates

>1.3.2 Opaque validators

There should be something about global uniqueness requirements for
opaque validators in here.  See my previous mail on why we need this
for conditional requests on varying resources.


>1.3.3 Weak and strong validators

[...]

>      |Modification suggested by Koen Holtman regarding opaque      |
>      |validators:  Rather than a simple restriction on how opaque  |
>      |validators are generated, one can state a somewhat more      |
>      |liberal restriction:  a validator is strong if the origin    |
>      |server knows for sure that the same validator value is not   |
>      |associated with two different copies of a resource.  The     |
>      |server's knowledge may change over time, even if the resource|
>      |is unchanged.  This allows the server to mark the value as   |
>      |``weak'' when it is not able to make this guarantee, and     |
>      |later mark the same value as ``strong''.  Since the strong   |
>      |comparison function will treat these two instances as        |
>      |unequal, no logical error results.                           |

I think it is best if you leave the above general discussion out of the
1.1 spec. The example below might be good to put in, however.

>      |                                                             |
>      |For example, if the value of validator is an encoding of an  |
>      |entity's last-modified time with a resolution of 1 second,   |
>      |until at least one second (measured with the same resolution)|
>      |has passed, an opaque validator based on this timestamp must |
>      |be tagged as ``weak'' (unless the server has additional      |
>      |information about updates).  However, after that time, the   |
>      |server knows that the entity has not yet changed during the  |
>      |current second, and so can encode the same value as a strong |
>      |opaque validator.  Even if the entity is then updated during |
>      |the same second, there is no problem because any subsequent  |
>      |validator generated during that second would have to be      |
>      |tagged as ``weak'', and could not compare equal to the strong|
>      |version.                                                     |
>      |                                                             |
>      |Note that in order to use this technique, the server must    |
>      |take into account possibly non-monotonic clocks or           |
>      |modification times, and should probably allow a period much  |
>      |larger than one second before using such a value as a strong |
>      |validator.                                                   |
>

[....]

>   The phrase ``significantly'' older must take into account such
>   problems as clock and timestamp resolution, and some uncertainty
>   about whether the server obtains the Last-modified:  time before or
>   after it generates the Date: value.  We adopt the arbitrary
>   requirement that this be at least 60 seconds.  We also recommend that
>   an origin server SHOULD obtain an entity's Last-modified: time as
>   close as possible to the time that it generates the Date: value for a
>   response.

Maybe we need the opinion of an NFS specialist about the above 60
seconds.  NFS allows weak caching of last modified times of a file, I
believe.  I would feel happier if the 60 seconds above was 30 minutes,
to account for servers running on distributed filesystems worse than
(Sun's implementation of) NFS.


>DO NOT IMPLEMENT TO THIS DOCUMENT!                             [Page 13]
>
>Internet-Draft          HTTP 1.1 caching (DRAFT)      3 April 1996 16:41

>1.3.4 Rules for when to use opaque validators and last-modified dates

>1.3.5 Non-validating conditionals

>1.3.6 Other issues
>   TBS: what if no validator present in response?

You must allow caches to use heuristics in this case.  Whatever you
do, don't forbid caching if validators are absent, else we will never
get CGI authors to start supplying caching-related headers.

>
>1.4 Cache-control mechanisms

>1.5 Warnings

>1.6 Explicit indications regarding user-specified overrides

>1.7 Values and responses
>   TBS
>
>      |The gist of this section is that a cache doesn't simply      |
>      |parrot the entire response it once received from a server    |
>      |when it responds to a new request, especially if any         |
>      |validations were done in the meantime.                       |
>      |                                                             |
>      |We need to make a clear distinction between headers that are |
>      |stored with a cache entry and those that aren't, and we have |
>      |to define carefully what headers are simply deleted when a   |
>      |cache entry is updated.  Section 1.7.1 already talks about   |
>      |combining headers, but doesn't provide a way to remove, say, |
>      |a "Response is stale" Warning after a fresh response is      |
>      |received.                                                    |
>      |                                                             |
>      |I suspect that there may need to be some examination of the  |
>      |categories of General-header, Response-Header, and           |
>      |Entity-Header in order to make this all clear.               |

Yes, I suspect this too.  The categories are now in a bit of a mess.

>
>
>
>
>
>DO NOT IMPLEMENT TO THIS DOCUMENT!                             [Page 17]
>
>Internet-Draft          HTTP 1.1 caching (DRAFT)      3 April 1996 16:41
>
>
>1.7.1 Combining headers

>1.7.2 Scope of expiration
>   TBS:
>
>      |This is taken verbatim from the issues list, and needs       |
>      |clarification and resolution:                                |
>      |                                                             |
>      |It's a simplification to say that as soon as any one piece of|
>      |information associated with a URI becomes stale, all of the  |
>      |rest of the information should become stale too.             |
>      |                                                             |
>      |If we make that simplification, 'freshness' applies to "the  |
>      |URI's information" in general, rather than any particular    |
>      |piece of it. This means that dates and expires etc. apply to |
>      |any cached info that a proxy might have with a URI and not   |
>      |just the one particular piece of data.                       |

I already did _not_ make that simplification when writing the Vary
header text.  From the caching rules given there, it can be deduced
that freshness applies only to a response, not to the whole resource.

>
>1.8 Caching and content negotiation

>1.9 Caching and ranges

>1.10 Shared and non-shared caches
>
>1.11 Stuff that needs to be said somewhere

>1.11.1 Detecting firsthand responses
>   Note that a client can usually tell if a response is firsthand by
>   comparing the Date: to its local request-time, and hoping that the
>   clocks are not badly skewed.

Isn't a response firsthand if an Age: header is absent?  I believe Age
headers are only added if a response comes from cache memory.  Proxies
just acting as a relay of a firsthand response should not add an Age
header, I believe. (As least that was the design I remembered from the
Age header discussions some time ago; It could be that you chanced
your mind since.)

>
>1.11.2 Disambiguating expiration values

>1.11.3 Disambiguating multiple responses

>1.12 Cache keys
>   TBS
>
>      |This section will discuss how caches construct and use cache |
>      |lookup keys.  Specific issues include:                       |
>      |                                                             |
>      |  - Canonicalization of URLs                                 |
>      |                                                             |

>      |  - Use of variant IDs                                       |
>      |                                                             |
>      |  - Use of Vary: header                                      |

I can say something about these: Caches do _not_ key on variant IDs.
For varying resources, they key either on combinations of request
headers (second case in my Vary text) or world-unique opaque cache
validators (first case in my Vary text).  I think it is unwise to try
to give a storage-by-cache-key algorithm for varying resources,
because the best storage structure is not

   (URL,other key stuff) --> response

but

   URL  -->  Searchable Set of responses
              (search operations include 
                    - walk through set
                    - search on cache validator
                    - (maybe) search on request header profile
              )

If you try to formulate a working (URL,other key stuff), you'll
encounter huge difficulties in addressing subtleties about variants
that change though time and variant choosing algorithms that change
through time.  Trust me on this one, I spent few weeks on this in
December.  I think it is best _not_ to specify a model for how cache
memory works in the 1.1 spec.  If we want to help proxy authors, we
can better talk about such things in a non-standards-track document.

>      |                                                             |
>1.13 Cache-related problems not addressed in HTTP/1.1

>1.14 Cache operation when receiving errors or incomplete responses

[...]

>   Responses received with a status code other than 2xx or 3xx should
>   not be stored in a cache.
>
>      |This requires more consideration.  Are all 2xx and 3xx       |
>      |responses cachable?

In general, they are only cachable if indicated as such, except for
200 and 206 responses.

>       Are all others not cachable?  Perhaps a |
>      |better way of saying this is that Entity-bodies received with|
>      |other status codes may not be used in a response to a        |
>      |subsequent request.                                          |
>
>1.15 Compatibility with earlier versions of HTTP

>1.16 Side effects of GET and HEAD

I agree to your text about side effects!  Yay!

[...]
>   We note one exception to this rule: since some applications have
>   traditionally used GETs and HEADs with query URLs (those containing a
>   ``?'' in the rel_path part) to perform operations with significant
>   side effects, caches MUST NOT treat responses to such URLs as fresh
>   unless the server provides an explicit expiration time.

The above is a heuristic used by many 1.0 proxy caches (not user agent
caches).  Most people would not like to cast this heuristic in stone.

>   This specifically means that responses from HTTP/1.0 servers for such
>   URIs should not be taken from a cache.
>
>1.17 Invalidation after updates or deletions

>1.18 Write-through mandatory

>2 HTTP protocol parameters related to caching
>
>   This section augments or updates parts of section 3 of the existing
>   HTTP/1.1 draft specification.
>
>2.1 Full date values

>2.2 Opaque validators
>   Opaque validators are quoted strings whose internal structure is not
>   visible to clients or caches.
>
>      opaque-validator = strong-opaque-validator | weak-opaque-validator
>                              | null-validator
>
>      strong-opaque-validator = quoted-string
>
>      weak-opaque-validator = quoted-string "/W"
>
>      null-validator = <"">
>
>   Note that the ``/W'' tag is considered part of a weak opaque
>   validator; it MUST not be removed by any cache or client.
>
>   There are two comparison functions on opaque validators:
>
>      - The strong comparison function: in order to be considered
>        equal, both validators must be identical in every way, and
>        neither may be weak.
>
>      - The weak comparison function: in order to be considered
>        equal, both validators must be identical in every way,
>        except for the presence or absence of a ``weak'' tag.
>
>   The weak comparison function MAY be used for simple (non-subrange)
>   GET requests.  The strong comparison function MUST be used in all
>   other cases.

Add stuff here about global uniqueness requirements for opaque
validators.

>
>DO NOT IMPLEMENT TO THIS DOCUMENT!                             [Page 27]
>
>Internet-Draft          HTTP 1.1 caching (DRAFT)      3 April 1996 16:41
>
>
>   The null validator is a special value, defined as never matching the
>   current validator of an existing resource, and always matching the
>   ``current'' validator of a resource that does not exist.

What is the difference between supplying the null validator and
supplying no Cval header at all?  If there is none, why do we need to
talk about the null validator?

>
>2.3 Variant IDs
>   TBS
>
>      |This section will discuss the semantics and syntax of variant|
>      |IDs, including the appropriate comparison function(s).       |
>
>      variant-id = token

The appropriate comparison function is equality, and it does not have
to be specified in your text.  It is already specified in my text
about replacement keys.

>
>
>3 HTTP headers related to caching

>3.1 Age
>   Caches transmit age values using:
>
>       Age = "Age" ":" age-value
>
>       age-value = delta-seconds
>
>   Age values are non-negative decimal integers, representing time in
>   seconds.
>
>   If a cache receives a value larger than the largest positive integer
>   it can represent, or if any of its age calculations overflows, it
>   MUST not transmit an Age: header.

EEK!!! This way to resolve the failure is totally unacceptable.  You
get responses older than 2^31 seconds looking fresh!  A cache should
not be allowed to delete an Age header with a number it can't
represent.

>  Otherwise, HTTP/1.1 caches MUST
>   send an Age: header in every response.  Caches SHOULD use a
>   representation with at least 31 bits of range.

This is misleading: I would make this: `in every response which
includes components taken from cache memory', and add that proxy
caches must not add or modify age headers when relaying a response
(when acting as a tunnel).

>
>3.2 Authorization
>   The draft HTTP/1.1 specification says (section 10.6):
>
>      Responses to requests containing an Authorization field are not
>      cachable.
>
>   We replace that with:
>
>      Responses to requests containing an Authorization: field MUST
>      not be returned from a shared cache to any user other than the
>      user that made the initial request.  If the cache cannot
>      securely determine the identity of the user, then it must act
>      as if the identity does not match.

WARNING: This replacement will be highly controversial.  Your text
allows shared caches to store authenticated content (like a page from
an on-line newspaper someone payed for).  Your change will not be
acceptable to quite a number of people who sell information over the
net.  This is similar to the medical records problem.  Also, many
servers using authentication rely on the side effect that it disables
all caching.  Such servers will break because of this.

If you want to make this replacement, discuss it on the list first.
Better yet, table it for 1.2.


>DO NOT IMPLEMENT TO THIS DOCUMENT!                             [Page 28]
>
>Internet-Draft          HTTP 1.1 caching (DRAFT)      3 April 1996 16:41
>
>
>      |Protection against replay attacks seems to be a function of  |
>      |how the cache ``securely'' determines user identies.  That   |
>      |is, if the mechanism does not protect against replay attacks,|
>      |then it is by definition not secure.                         |
>
>3.3 Cache-control
>   The Cache-Control general-header field is used to specify directives
>   that must be obeyed by all caching mechanisms along the
>   request/response chain. The directives specify behavior intended to
>   prevent caches from adversely interfering with the request or
>   response.  These directives typically override the default caching
>   algorithms.  Cache directives are unidirectional in that the presence
>   of a directive in a request does not imply that the same directive
>   should be given in the response.
>
>   Cache directives must be passed through by a proxy or gateway
>   application, regardless of their significance to that application,
>   since the directives may be applicable to all recipients along the
>   request/response chain. It is not possible to specify a
>   cache-directive for a specific cache.
>
>          Cache-Control   = "Cache-Control" ":" 1#cache-directive
>
>          cache-directive = "public"
>                          | "private" [ "=" <"> 1#field-name <"> ]
>                          | "no-cache" [ "=" <"> 1#field-name <"> ]
>                          | "no-store"
>                          | "max-age" "=" delta-seconds
>                          | "max-stale" "=" delta-seconds
>                          | "min-fresh" "=" delta-seconds

Just an idea: I think it would be nice for readability if you do

           cache-directive = cache-request-directive 
                           | cache-response-directive 

           cache-request-directive =
                           | "no-cache"
                           | "no-store"
                           | "max-age" "=" delta-seconds
                           | "max-stale" "=" delta-seconds
                           | "min-fresh" "=" delta-seconds

           cache-response-directive = 
                             "public"
                           | "private" [ "=" <"> 1#field-name <"> ]
                           | "no-cache" [ "=" <"> 1#field-name <"> ]
                           | "no-store"
                           | "max-age" "=" delta-seconds

>
>      |and perhaps                                                  |
>      |                                                             |
>      |                    | "max-uses" "=" 1*DIGIT                 |
>      |                    | "use-count" "=" 1*DIGIT                |
>      |                                                             |
>   When a directive appears without any 1#field-name parameter, the
>   directive applies to the entire request or response.  When such a
>   directive appears with a 1#field-name parameter, it applies only to
>   the named field or fields, and not to the rest of the request or
>   response.  This mechanism supports extensibility; implementations of
>   future versions of the HTTP protocol may apply these directives to
>   header fields not defined in HTTP/1.1.
>
>   The cache-control directives can be broken down into these general
>   catagories:
>
>      - Restrictions on what is cachable; these may only be imposed
>        by the origin server.
>
>      - Restrictions on what may be stored by a cache; these may be
>        imposed by either the origin server or the end-user client.
>DO NOT IMPLEMENT TO THIS DOCUMENT!                             [Page 29]
>
>Internet-Draft          HTTP 1.1 caching (DRAFT)      3 April 1996 16:41
>
>
>      - Modifications of the basic expiration mechanism; these may
>        be imposed by either the origin server or the end-user
>        client.
>
>      - Controls over cache revalidation and reload; these may only
>        be imposed by an end-user client.
>
>      - Restrictions on the number of times a cache entry may be
>        used, and related demographic reporting mechanisms.
>
>   Caches never add or remove Cache-control: directives to requests or
>   responses.
>
>      |Check: is this true?                                         |

I could imagine a situation in which a proxy wants to strengthen the
restrictions in the Cache-Control headers of a relayed request, for
example by adding a min-age or removing a max-age.  Such strengthening
would be semantically transparent, and I see no reason to disallow it.
Also, haven't you already required that proxies redo requests, adding
cache-control: no-cache, if some paradoxical response is encountered?

>
>3.3.1 Restrictions on what is cachable
>   Normally, a caching system may always store a response as a cache
>   entry, may return it without validation if it is fresh, and may
>   return it after validation in any case.

Please be more specific about 200 (OK) responses vs. other responses
and the like here, and about GET vs. POST requests.

[....]

>   no-cache        indicates that all or parts of the response message
>                   MUST NOT be cached.  This allows an origin server to
>                   prevent caching even by caches that have been
>                   configured to return stale responses to client
>                   requests.

Say something about no-cache="set-cookie" and the like!  The rule is
that a cache must delete any set-cookie response header before storing
the response in the cache.

>
>      Note: HTTP/1.0 caches will not recognize or obey this
>                      directive.
>
>   TBS: precedence relations between public, private, and no-cache.
>
>3.3.2 Restrictions on what may be stored by a cache
>   The "no-store" directive applies to the entire message, and may be
>   sent either in a response or in a request.

What, if anything, is the difference between 'no-cache' and
'no-store'?  I thought `no-cache' implied `no-store'.


>   If sent in a request, a
>   cache MUST NOT store store any part of either this request or any
>   response to it.  If sent in a response, a cache MUST NOT store any
>   part of either this response or the request that elicited it.  This
>   directive applies to both non-shared and shared caches.
>
>   Even when this directive is associated with a response, users may
>   explicitly store such a response outside of the caching system (e.g.,
>   with a ``Save As'' dialog).  History buffers may store such responses
>   as part of their normal operation.
>
>   The purpose of this directive is to meet the stated requirements of
>   certain users and service authors who are concerned about accidental
>   releases of information via unanticipated accesses to cache data
>   structures.  While the use of this directive may improve privacy in
>   some cases, we caution that it is NOT in any way a reliable or
>   sufficient mechanism for ensuring privacy.  In particular, HTTP/1.0
>   caches will not recognize or obey this directive, malicious or
>   compromised caches may not recognize or obey this directive, and all
>   communications networks may be vulnerable to eavesdropping.

Delete the `all' before `communications' above.  Remember quantum
crypto?


>3.3.3 Modifications of the basic expiration mechanism

>   max-stale       Indicates that the client is willing to accept a
>                   response that has exceeded its expiration time by no
>                   more than the specified number of seconds.  If a
>                   cache does return a stale response in response to
>                   such a request, it MUST mark it as stale using the
>                   Warning: header.

Add something about detectability here, see my recent message about
cookies and all.  You may want to go for the `intransparent'
directive.

>   Note that HTTP/1.0 caches will ignore these directives.
>
>3.3.4 Controls over cache revalidation and reload

[...]

>   Note that HTTP/1.0 caches will ignore these directives, except for
>   ``Pragma: no-cache''.

Some 1.0 caches ignore even ``Pragma: no-cache'', I am sure.

>
>      When an intermediate cache is forced, by means of a
>      ``max-age=0'' directive, to revalidate its own cache entry, and
>      the client has supplied its own validator in the request, the
>      supplied validator may differ from the validator currently
>      stored with the cache entry.  In this case, the cache may use
>      either validator in making its own request without affecting
>      semantic transparency.

It can also send both in the If-Valid header!  There is no need to
make tradeoffs beforehand.

 If-valid: "4358743", "23409823"/W

See my remarks on variant-IDs in your If-Valid header elsewhere.

>      However, the choice of validator may affect performance.  The
>      best approach is for the intermediate cache to use its own
>      validator when making its request.  If the server replies with
>      304 (Not Modified), then the cache should return its now
>      validated copy to the client with a 200 (OK) response.  If the
>      server replies with a new Entity-body and cache validator,
>      however, the intermediate cache should compare the returned
>      validator with the one provided in the client's request, using
>      the strong comparison function.  If the client's validator is
>      equal to the origin server's, then the intermediate cache
>      simply returns 304 (Not Modified).  Otherwise, it returns the
>      new Entity-body with a 200 (OK) response.
>
>
>
>DO NOT IMPLEMENT TO THIS DOCUMENT!                             [Page 33]
>
>Internet-Draft          HTTP 1.1 caching (DRAFT)      3 April 1996 16:41
>
>
>   If a request includes the "no-cache" directive, it should not include
>   "fresh-min", "max-stale", or "max-age".

I think this is overly restrictive.


>
>3.3.5 Restrictions on use count and demographic reporting
>   TBS
>
>3.4 CVal

[....]

>   TBS: does the protocol allow the combination of a null validator and
>   a variant-ID?

There is little value in allowing it, but why shouldn't the protocol
allow is?

>
>3.5 Date

>3.6 Expires

>3.7 If-Invalid
>   The If-Invalid request-header field is used with a GET method to make
>   it conditional: if the server's current validator for the resource is
>   the same as the supplied validator, the server should return a 304
>   (not modified) response without any Entity-Body.

Oops!  I think that means that I must change an If-Valid to If-Invalid
somewhere in my text.

>   If the validators
>   do not match, and no Range: request-header is present, the server
>   should return a 200 (OK) response and the full Entity-body.  If the
>   validators do not match, and a Range: request-header is present, the
>   server should return a 206 (partial body) response and the specified
>   subrange as the Entity-body.
>
>DO NOT IMPLEMENT TO THIS DOCUMENT!                             [Page 35]
>
>Internet-Draft          HTTP 1.1 caching (DRAFT)      3 April 1996 16:41
>
>
>   See section 1.9 for full details about handling the Range:
>   request-header.
>
>   The If-Invalid request-header may also be used to pass a set of
>   validators, each associated with a specific variant-id. 

No. This is not what we agreed on on the list. We agreed on

      If-Invalid = "If-Invalid" ":" 1#opaque-validator

and I wrote my text to match this.  With your version, which is just
the old Variant-Set mechanism, I cannot provide smooth upwards
compatibility with a future content negotiation mechanism without
adding a lot of cruft to 1.1. Including variant-ids is unnecessary and
undesirable.

>   This is
>   known as a variant-set.  This is used when a cache holds one or more
>   cache entries for variants of a multi-entity resource; it allows the

:) It seems you decided to use `multi-entity' instead of `vary' at the
same point I decided to use `vary' instead of `multi-entity'.  My
scheme now is:


             -- non-varying resources
           /
          /
 resources
          \                       -- opaquely varying resources
           \                     /
             -- varying resources
                                 \
                                  --  transparently varying resources
                                        = negotiable resources


>   server to avoid sending a full entity body without requiring the
>   cache to understand the server's variant selection mechanism.
>
>   Once the server has decided (from other request-header fields) which
>   variant matches the request, it can then use the variant-set
>   information to decide if the cache holds a valid copy of the correct
>   variant.  If exactly one of elements of the variant set is the
>   appropriate response, the server should return a 304 (not modified)
>   response without any Entity-Body, but containing a CVal:
>   response-header field that indicates the proper variant-id.

No, the Cval: indicates the proper cache validator.  Inclusion of the
variant-ID is optional.

>  If the
>   other request-header fields do not allow the server to select a
>   specific variant then the operation is not performed, and the server
>   returns (300) Multiple Choices.

The above sentence is wrong: 1.1 cannot require that a 300 is returned
in this case, as 1.1 does not even define a 300 other than a
placeholder for future extension.  You can delete the sentence above,
I cover this in my Vary header text.

>
>   TBS: use with other methods?

I would not talk about other methods than GET and HEAD.

>
>      If-Invalid = "If-Invalid" ":" if-invalid-rhs
>
>      if-invalid-rhs = opaque-validator | variant-set
>
>      variant-set = 1#variant-set-item
>
>      variant-set-item = opaque-validator ";" variant-id

No.  See above.

>   The correct form (opaque-validator or variant-set) depends on whether
>   or not the request is being made to a multi-entity resource which
>   uses the variant-id mechanism.

Not all varying resources use variant-IDs, so the above is wrong.  
My Vary header text covers this, you can delete the above sentence.

    If the server has returned a
>   variant-id in the CVal: header of a prior response, a conditional GET
>   on this resource MUST use the variant-set form, and SHOULD include
>   the variant-set-items for all of the fresh cache entries associated
>   with the resource.

This MUST means that caches MUST remember whether prior responses used
variant-ids.  This is unacceptable, a cache must always be allowed to
act as a tunnel.  Anyway, my Vary header text covers this, you can
delete the above sentence.

>      Note: although the protocol does not prohibit the use of the
>      Range: header with a multi-item variant-set, it is not clear
>      that this will always yield a useful result.

Use of the Range: header can certainly be useful under some
conditions.  It is best to delete the above note.

>
>   Examples of single-entity form:
>
>      - If-Invalid: "xyzzy"
>
>      - If-Invalid: "xyzzy"/W
>
>   Examples of multiple-entity form:
>
>
>DO NOT IMPLEMENT TO THIS DOCUMENT!                             [Page 36]
>
>Internet-Draft          HTTP 1.1 caching (DRAFT)      3 April 1996 16:41
>
>
>      - If-Invalid: "xyzzy";4
>
>      - If-Invalid: "xyzzy";3, "r2d2xxxx";5, "c3piozzzz";7
>
>      - If-Invalid: "xyzzy"/W;3, "r2d2xxxx"/W;5, "c3piozzzz"/W;7

No, See above.  Also, you left out GONK, my favorite droid.

>
>   If the request would, without the If-Invalid: header, result in
>   anything other than a 2xx status, then the If-Invalid:  header is
>   ignored.
>
>   The purpose of this feature is to allow efficient updates of cached
>   information with a minimum amount of transaction overhead.

If you are completely confused at this point about what I want
If-Invalid to look like, tell me, and I'll pull Roy's designs from the
archive.  I would be prepared to write complete text for the
If-Invalid header.  As I said, there is no easy way for me to let my
Vary+Alternates text work with the If-Invalid header you specify,
because your if-invalid is basically the old variant-set.

>
>3.8 If-Modified-Since


>      A GET method with an If-Modified-Since: header and no Range:
>      header requests that the identified resource be transferred
>      only if it has been modified since the date given by the
>      If-Modified-Since header. The algorithm for determining this
>      includes the following cases:

Note that varying resources cannot do anything useful with IMS
requests other than ignoring the IMS header.


>3.9 If-Valid

See If-invalid, of course.


>3.10 Pragma

>3.11 Unless-Modified-Since

Wouldn't that be `if-unmodified-since'?  Else if-invalid has to be
unless-valid.  I think unless-valid is more clear, by the way.


>3.12 Warning

See Roy's and my comments about the name `warning' vs. the 99 code.


>4 Cache replacement algorithms
>
>   TBS

I already covered cache replacement for varying resources.  For
un-varying resources, the spec does not need to say anything special,
except maybe the thing about the response with the later date header
invalidating the response with the earlier date.

There are lots of w3 conference papers about replacement heuristics,
the 1.1 spec does not need to educate cache authors about this.


>
>
>5 Other issues

>5.1 Authentication
>   TBS
>
>5.2 State
>   From Shel Kaphan, may need editing and references to other documents:

Note that Shel wrote this _before_ Roy objected to the consensus of
the Feb. cache meeting that caches must always be transparent.

>5.3 Caching and methods with side effects

>5.4 Network partitions

>5.5 Caching of negative responses

>5.6 History Lists

>5.7 Bypassing in caching hierarchies

>6 Security Considerations
>
>6.1 Location: headers and spoofing
>   TBS

The spoofing issue is not relevant for the 1.1 spec.  It will rear its
ugly head again when we discuss transparent content negotiation in
May.  The loss of performance due to incorrect invalidation by
malicious parties may be an issue, if we have an invalidation header
in 1.1.


>7 Acknowledgements

>8 References

>9 Author's address

Received on Monday, 8 April 1996 14:17:37 UTC