Age Header Field in HTTP/1.1

Network Working Group                                        R. Fielding
INTERNET-DRAFT                                               U.C. Irvine
<draft-fielding-http-age-00>
Expires six months after publication date.                 26 March 1997


                    Age Header Field in HTTP/1.1


Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its
   areas, and its working groups.  Note that other groups may also
   distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other
   documents at any time.  It is inappropriate to use Internet-Drafts
   as reference material or to cite them other than as
   ``work in progress.''

   To learn the current status of any Internet-Draft, please check
   the ``1id-abstracts.txt'' listing contained in the Internet-Drafts
   Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast),
   or ftp.isi.edu (US West Coast).

   Discussion of this memo should take place within the HTTP working
   group (http-wg@cuckoo.hpl.hp.com).


Abstract

   The "Age" response-header field in HTTP/1.1 [RFC 2068] is intended
   to provide a lower-bound for the estimation of a response message's
   age (time since generation) by explicitly indicating the amount of
   time that is known to have passed since the response message was
   retrieved or revalidated.  However, there has been considerable
   controversy over when the Age header field should be added to a
   response.  This document explains the issues and provides a set of
   proposed changes for the revision of RFC 2068.


1. Problem Statement

   HTTP/1.1 [1] defines the Age header field in section 14.6:

      The Age response-header field conveys the sender's estimate of the
      amount of time since the response (or its revalidation) was generated
      at the origin server. A cached response is "fresh" if its age does
      not exceed its freshness lifetime. Age values are calculated as
      specified in section 13.2.3.

           Age = "Age" ":" age-value

           age-value = delta-seconds

      Age values are non-negative decimal integers, representing time in
      seconds.

      If a cache receives a value larger than the largest positive integer
      it can represent, or if any of its age calculations overflows, it
      MUST transmit an Age header with a value of 2147483648 (2^31).
      HTTP/1.1 caches MUST send an Age header in every response. Caches
      SHOULD use an arithmetic type of at least 31 bits of range.

   This document focuses on the ambiguous use of the term "caches" in
   the second-to-last line above.  The ambiguity is due to the fact that
   a cache never sends responses --- only a server application (proxy,
   gateway, or origin server), which may or may not include a cache, is
   capable of sending a response.  HTTP/1.1 defines a "cache" as

      A program's local store of response messages and the subsystem
      that controls its message storage, retrieval, and deletion. A
      cache stores cachable responses in order to reduce the response
      time and network bandwidth consumption on future, equivalent
      requests. Any client or server may include a cache, though a cache
      cannot be used by a server that is acting as a tunnel.

   There are two possible interpretations of

      HTTP/1.1 caches MUST send an Age header in every response.

   Either

      a) An HTTP/1.1 server that includes a cache MUST send an Age
         header field in every response.
   or
      b) An HTTP/1.1 server that includes a cache MUST include an Age
	 header field in every response generated from its own cache.
      
   The remainder of this document discusses the relative merits of these
   two options, referred to as "Option A" and "Option B", concluding in
   section 5 with a set of proposed changes to remove the ambiguity
   from future editions of the HTTP/1.1 specification.

2. Review of HTTP/1.1 Response Age Calculation

   HTTP/1.1 defines an algorithm for calculating the age of a response
   message upon receipt by a cache.  This document does not propose any
   modification of this algorithm; we describe it here in order to
   provide the background necessary to understand the later analyses.
   We only provide a brief summary here -- for a full explanation, see
   section 13.2.3 (Age Calculations) of RFC 2068 [1].

   Summary of age calculation algorithm, when a cache receives a
   response:

      /*
       * age_value
       *      is the value of Age: header received by the cache with
       *              this response.
       * date_value
       *      is the value of the origin server's Date: header
       * request_time
       *      is the (local) time when the cache made the request
       *              that resulted in this cached response
       * response_time
       *      is the (local) time when the cache received the
       *              response
       * now
       *      is the current (local) time
       */
      apparent_age           = max(0, response_time - date_value);
      corrected_received_age = max(apparent_age, age_value);
      response_delay         = response_time - request_time;
      corrected_initial_age  = corrected_received_age + response_delay;
      resident_time          = now - response_time;
      current_age            = corrected_initial_age + resident_time;

3. Analysis of Option A

   If we were to assume that

      An HTTP/1.1 server that includes a cache MUST send an Age
      header field in every response.

   is true, then an HTTP/1.1 proxy containing a cache would be required
   to add an Age header field value to every response that was
   forwarded, including those that were obtained first-hand from the
   origin server and never touched by the caching mechanism.  This would
   directly contradict the paragraph in section 13.2.1 of RFC 2068 that
   states:

      The expiration mechanism applies only to responses taken from a cache
      and not to first-hand responses forwarded immediately to the
      requesting client.

   and also directly contradicts the last paragraph of section 13.2.3 of
   RFC 2068 that states:

      Note that a client cannot reliably tell that a response is first-
      hand, but the presence of an Age header indicates that a response
      is definitely not first-hand.

   If we further assume that the above two paragraphs are in error, then
   the following example illustrates the effect of the age calculation
   when a first-hand response passes through a hierarchical system of
   proxy caches (A, B, C), with each segment taking (a, b, c, d) amount
   of time to satisfy the request:

     UA  ------->  A  ------->  B  --------->  C  ------->  OS
            a            b             c             d

   Since the age calculation includes an estimation of clock skew by
   each recipient (apparent_age), we also have the variables

      skewC  = max(0, response_time(C) - date_value(OS));
      skewB  = max(0, response_time(B) - date_value(OS));
      skewA  = max(0, response_time(A) - date_value(OS));
      skewUA = max(0, response_time(UA) - date_value(OS));

   then the received age will be calculated as follows:

     At  C:  age=max(skewC,0)+d
         B:  age=max(skewB,max(skewC,0)+d)+(c+d)
         A:  age=max(skewA,max(skewB,max(skewC,0)+d)+(c+d))+(b+c+d)
        UA:  age=max(skewUA,max(skewA,max(skewB,max(skewC,0)+d)+(c+d))+
                                                    (b+c+d))+(a+b+c+d)

   Because the response is first-hand, we know that the real age at UA
   must be less than (a+b+c+d).  Note that (a+b+c+d) will always be
   added by UA, so the cumulative overestimation of the age will be
   at least

      max(skewUA,max(skewA,max(skewB,max(skewC,0)+d)+(c+d))+(b+c+d))

   If we further assume that all clocks are synchronized (the minimum
   case), then the age at UA will be estimated as

      d+(c+d)+(b+c+d)+(a+b+c+d)

   Note that the above is the minimum overestimation; since the variables
   skewC, skewB, skewA, and skewUA are all unbounded, the clock skew of
   each host on the request path adds to the perceived response age of
   all downstream recipients.  Furthermore, a fast clock on the origin
   will add to the overestimated age at each hop.

   However, in section 13.2.3 of RFC 2068, we also find

      In essence, the Age value is the sum of the time that the response
      has been resident in each of the caches along the path from the
      origin server, plus the amount of time it has been in transit along
      network paths.

   which in our example would imply an age value of (a+b+c+d).  Thus,
   Option A would result in an incorrect calculation of the age value,
   resulting in an overestimation of age in all cases, with the amount
   of error bounded only by the synchronization of clocks for each and
   every recipient along the request chain, plus the cumulative
   overestimation of the network transit time by each recipient.

4. Analysis of Option B

   If we were to assume that

      An HTTP/1.1 server that includes a cache MUST include an Age
      header field in every response generated from its own cache.

   then an Age header field would not be added to a response that is
   received first-hand, and thus we would not contradict the sections of
   RFC 2068 that were quoted above.

   Using the same example as in the analysis of Option A, the
   calculation of age with Option B would be as follows:

     At  C:  age=max(skewC,0)+d
         B:  age=max(skewB,0)+(c+d)
         A:  age=max(skewA,0)+(b+c+d)
        UA:  age=max(skewUA,0)+(a+b+c+d)

   Note that there is no cumulative overestimation of the age.  The
   estimated age value at each recipient is only dependent on the skew
   between the recipient's clock and that of the origin server, plus the
   total amount of time the request and response has been in transit
   along the network path.  The minimum estimated age at UA is 

      (a+b+c+d)

   which matches the description provided in section 13.2.3 of RFC 2068.

5. Counter-arguments

   The only argument voiced against Option B is that the calculation is
   "less conservative" than Option A, and that being "conservative" is
   better in order to "reduce as much as possible the probability of
   inadvertently delivering a stale response to a user."
   
   If "conservative" means "always overestimates more than the other
   option", then the argument is certainly true.  However, if the
   purpose of Age was to provide an overestimate, then why stop there?
   Why not add arbitrary amounts of age to forwarded response, just in
   case?  Why not disable caching entirely?

   The reason is because HTTP caching is good for the Internet as a
   whole, and in particular for the owners of the network bandwidth that
   would be used to satisfy a request that has already been cached.
   Overestimating response age reduces the effectiveness of caching, and
   thus results in increased network congestion, added bandwidth
   requirements, and in some cases additional per-packet charges.

   Age was created to compensate for the possibility that clock skew
   between the origin server (represented by the Date header field) and
   the user agent (represented by the request time) might result in the
   age of a response being underestimated.  Age was created so that
   HTTP/1.1 caches can communicate the actual observed age, thus
   providing a lower-bound for the age calculation that would be more
   reliable than simply calculating the difference between the date
   stamps.

   If Age is to be useful, it must be trusted by cache implementers. 
   In order to be trusted by cache implementers, the value of the Age
   header field must match its definition: the age of the response as
   observed by the application that generated the response message.

   Furthermore, Option B is guaranteed to be conservative if all of the
   applications involved are HTTP/1.1-compliant or if the recipient's
   clock is equal to or ahead of the origin server clock.  The only case
   in which Option A *might* result in a better estimation than Option B
   is where one or more HTTP/1.0 caches are in the request chain AND the
   response came from one of those HTTP/1.0 caches in which it resided
   for some time AND the user agent's system clock is running behind the
   origin server's clock.  In this one case, Option A would compensate
   for the clock skew if there existed an HTTP/1.1 cache between the
   user agent and the HTTP/1.0 cache generating the response AND the
   HTTP/1.1 cache is better-synchronized to the origin server clock.

   The above scenario would require a minimum of two proxies in the
   chain, with at least one outer proxy being an old HTTP/1.0 cache and
   at least one inner proxy using HTTP/1.1.  Given that, for many other
   reasons (described in RFC 2068), an HTTP/1.0 proxy is incapable of
   reliably caching HTTP messages in a proxy hierarchy, this scenario
   is not compelling.

   In contrast, Option A would overestimate the age on all HTTP/1.1
   requests, even when there are no longer any HTTP/1.0 proxies.  It
   would also make the age calculation dependent on the clock
   synchronization of every recipient along the request chain, with the
   possibility for drastic overestimation if any of the recipients has a
   bad clock.  Option A would therefore make the Age header field value
   consistently less reliable than simple comparison of date stamps.

5. Conclusion and Proposed Changes

   Option B is the correct interpretation of when the Age header field
   should be added to an HTTP/1.1 response.  The following changes to
   RFC 2068 will remove the ambiguity.

   In section 14.6 (Age), replace the sentence

      HTTP/1.1 caches MUST send an Age header in every response.

   with

      An HTTP/1.1 server that includes a cache MUST include an Age
      header field in every response generated from its own cache.

   In section 13.2.3 (Age Calculations), replace the paragraph

      HTTP/1.1 uses the Age response-header to help convey age information
      between caches. The Age header value is the sender's estimate of the
      amount of time since the response was generated at the origin server.
      In the case of a cached response that has been revalidated with the
      origin server, the Age value is based on the time of revalidation,
      not of the original response.

   with

      HTTP/1.1 uses the Age response-header to convey the estimated age
      of the response message when obtained from a cache.  The Age field
      value is the cache's estimate of the amount of time since the
      response was generated or revalidated by the origin server.

   Delete the following paragraph from section 13.2.3:

      Note that this correction is applied at each HTTP/1.1 cache along the
      path, so that if there is an HTTP/1.0 cache in the path, the correct
      received age is computed as long as the receiving cache's clock is
      nearly in sync. We don't need end-to-end clock synchronization
      (although it is good to have), and there is no explicit clock
      synchronization step.

   Replace the following two paragraphs from section 13.2.3:

      When a cache sends a response, it must add to the
      corrected_initial_age the amount of time that the response was
      resident locally. It must then transmit this total age, using the Age
      header, to the next recipient cache.

        Note that a client cannot reliably tell that a response is first-
        hand, but the presence of an Age header indicates that a response
        is definitely not first-hand. Also, if the Date in a response is
        earlier than the client's local request time, the response is
        probably not first-hand (in the absence of serious clock skew).

   with

      The current_age of a cache entry is calculated by adding the amount
      of time (in seconds) since the cache entry was last validated by
      the origin server to the corrected_initial_age.  When a response
      is generated from a cache entry, the server must include a single
      Age header field in the response with a value equal to the cache
      entry's current_age.

      The presence of an Age header field in a response implies that a
      response is not first-hand.  However, the converse is not true,
      since the lack of an Age header field in a response does not imply
      that the response is first-hand unless all caches along the
      request path are compliant with HTTP/1.1 (i.e., older HTTP caches
      did not implement the Age header field).

6. Security Considerations

   The proposed changes close a potential security problem with HTTP/1.1
   which would become manifest if a proxy with a slow clock (due to a
   hardware malfunction, failure to properly set, or caused to be reset
   by some malevolent agent) adds an Age header field to every response
   it forwarded, instead of only to those retrieved from its own cache,
   and thus eliminating the ability of a compliant downstream cache to
   reduce bandwidth usage on a congested network.  Although this is not
   a serious concern with today's use of HTTP caching, future use of
   hierarchical cache networks would be impacted.

7. Acknowledgements

   This document was derived from discussions by the author within the
   HTTP working group, particularly with Jeffrey C. Mogul.

9. References

   [1] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, and T. Berners-Lee.
       "Hypertext Transfer Protocol -- HTTP/1.1." RFC 2068, U.C. Irvine,
       DEC, MIT/LCS, January 1997.

9. Author's Address

   Roy T. Fielding
   Department of Information and Computer Science
   University of California, Irvine
   Irvine, CA  92697-3425

   Fax: +1(714)824-1715
   EMail: fielding@ics.uci.edu

Received on Wednesday, 26 March 1997 00:45:35 UTC