Re: Age Header Field in HTTP/1.1 from Roy T. Fielding on 1997-04-01 (ietf-http-wg@w3.org from January to March 1997)

From: Roy T. Fielding <fielding@kiwi.ICS.UCI.EDU>
Date: Mon, 31 Mar 1997 20:38:33 -0800
To: Jeffrey Mogul <mogul@pa.dec.com>
Cc: http-wg@cuckoo.hpl.hp.com
Message-Id: <9703312038.aa08552@paris.ics.uci.edu>
>For the record, when I wrote this paragraph (and I did write it),
>I should have written "HTTP/1.1 proxy caches", not "HTTP/1.1 caches".
>My error; I was being sloppy.

The issue applies equally to server-side (gateway) caches; the wording
I supplied is correct.

>      The expiration mechanism applies only to responses taken from a cache
>      and not to first-hand responses forwarded immediately to the
>      requesting client.
>
>No, this is not a contradiction.  That paragraph from section 13.2.1
>pertains to the decision about whether a response is stale or fresh.
>This is quite distinct from any rules about how the inputs to that
>decision are provided.

The expiration mechanism includes the Age header field, and that is how
I read the section.

>   and also directly contradicts the last paragraph of section 13.2.3 of
>   RFC 2068 that states:
>
>      Note that a client cannot reliably tell that a response is first-
>      hand, but the presence of an Age header indicates that a response
>      is definitely not first-hand.
>
>This is indeed a contradiction; the Note in section 13.2.3 is erroneous.

On what basis do you make that claim?  I know that doesn't follow my
intentions, and I know it doesn't follow Koen's intention when he asked
for the note to be added.  Your interpretation of Age is just wrong.

>   However, in section 13.2.3 of RFC 2068, we also find
>
>      In essence, the Age value is the sum of the time that the response
>      has been resident in each of the caches along the path from the
>      origin server, plus the amount of time it has been in transit along
>      network paths.
>
>   which in our example would imply an age value of (a+b+c+d).
>
>You are reading this without paying attention to the clear statement
>in the previous paragraph that "the Age header value is the sender's
>estimate of the  amount of time since the response was generated at
>the origin server."  The word "estimate" there on purpose.  The
>phrase "In essence" was meant to reinforce that, but I guess it
>didn't help enough.  If you prefer, we can rewrite that statement
>as 
>      In essence, the Age value is the sum of the time that the response
>      has been resident in each of the caches along the path from the
>      origin server, plus the amount of time it has been in transit along
>      network paths, plus a bound on the estimation error.

Oh, come on Jeff.  "estimate" is not a synonym for "inaccurate", and
"in essence" is not an invitation to ignore the rest of the sentence.
Furthermore, the equations I give for Option A conclusively demonstrate
that there is no bound on its estimation error.

>All of the points I make above are minor details; the real question
>comes up later in Roy's draft.
>
>In section 4, "Analysis of Option B", Roy carefully analyzes the
>possibility that Option B might overestimate the Age value.  He
>finds that there is no chance for an overestimate.  But he doesn't
>include here any analysis of possible *underestimates*. 

I put that in Section 5, as you know.

>In the
>following section, he does concede that these are possible.

Of course, since underestimation is also possible with Option A.
The only case I included is the only one that is relevant to the
discussion: the only condition under which Option A will not
underestimate and Option B will underestimate.
In all other cases, Option B will be more accurate than
Option A and either not underestimate the age, or underestimate it
no worse than does Option A.

>He
>then dismisses the scenario as "uncompelling" because of the number
>of caches involved.

AND the fact that all other cases, which are the only ones likely to
occur in practice, are not helped whatsoever by Option A.

>(I noticed that he doesn't point out that
>in order to get a significant overestimate from the Option A
>analysis, one also needs a long chain of caches.)

Long chain?  Try one proxy cache and one user agent cache.  A longer
chain increases the scope of the error, but the error would affect
existing proxy configurations.

>In section 5, Roy gets confused about what "conservative" means.

I used the exact words you gave in a response to http-wg.

>He states:
>
>   The only argument voiced against Option B is that the calculation is
>   "less conservative" than Option A, and that being "conservative" is
>   better in order to "reduce as much as possible the probability of
>   inadvertently delivering a stale response to a user."
>   
>   If "conservative" means "always overestimates more than the other
>   option", then the argument is certainly true.  However, if the
>   purpose of Age was to provide an overestimate, then why stop there?
>   Why not add arbitrary amounts of age to forwarded response, just in
>   case?  Why not disable caching entirely?
>
>But apparently fails to notice that the current language does
>not do any of these foolish things, nor has anyone advocated
>that the HTTP/1.1 spec do any of these (as far as I know.)

Of course not -- the point is that the purpose of Age is not to
provide an overestimate.  It is to provide an estimate.

>It shouldn't be necessary to state, but perhaps it has to be
>made clear, that the rules in section 14.6 are not there to
>make the Age estimate arbitrarily high.   They are there to
>make the Age estimate as accurate as possible WITHOUT UNDERESTIMATING
>the value.

That is what Option B does as well, and as effectively as Option A.

>We can still disagree whether an underestimate or an overestimate
>is preferrable.  I'll simply quote from what I wrote in August,
>in http://www.ics.uci.edu/pub/ietf/http/hypermail/1996q3/0439.html :
>
>    Underestimation is SERIOUSLY BAD, because it will lead to a cache
>    believing that a response is fresh when it is, in fact, stale.
>
>    Overestimation of the Age can lead to a cache treating a fresh
>    response as stale, which can cause extra revalidation messages.
>    This is somewhat inefficient, but will never lead to a client
>    inadvertently seeing an expired cache entry.  Underestimation is
>    thus a much worse error than overestimation, and so the spec is
>    designed to avoid underestimation as assiduously as possible.
>
>All arguments about whether caching are good or bad for the Internet
>are moot, if origin servers disable caching because their clients
>are unwittingly seeing stale responses.

I personally would ignore Age if it does not represent a lower bound.
The clock skew dependencies introduced by Option A are impossible to
work around, whereas the clock skew dependencies introduced by Option B
can be fixed by the recipient.  Option A makes the reliability of the
age calculation dependent on the clock of every recipient that touches
a message, which is an unacceptable level of error in a system that
depends on accurate caching to reduce network costs.

>Philosophical disagreements aside, Roy apparently ignores (or
>has forgotten about) an alternative that Koen proposed in August,
>and which I modified slightly; see
>	http://www.ics.uci.edu/pub/ietf/http/hypermail/1996q3/0456.html
>
>The proposal is that an HTTP/1.1 proxy cache:
>     (1) MUST add Age when serving from cache memory
>     (2) MUST add Age when relaying a response from a pre-1.1 source
>     (3) SHOULD NOT add Age when relaying a response from a 1.1 or
>	 higher source
>
>Rule #1 is apparently OK with Roy, since this corresponds exactly
>to his preferred option B.  Rule #3 is presumably also acceptable
>to Roy, since this corresponds to his Option B in an all-HTTP/1.1
>environment.  (We could quibble about whether this should be a SHOULD
>NOT or a MUST NOT, but my understanding is that our principle is
>"do not overconstrain the implementor", which leads to SHOULD NOT.)

I ignored it because there is no need for such a compromise.  There is
only one option that retains the definition of Age such that the age
calculation results in a reliable estimate, and that is Option B.
Whether or not the message has an HTTP-version of HTTP/1.1 has no
relevance to the clock skew between the recipient and the origin server,
and the only difference between Option A and Option B in terms of 
underestimating received age is that the former relies on everyone
else's clock being more accurate than the recipient's clock.
The compromise Age would still be unreliable in the presence of any
HTTP/1.0 sender, and therfore cannot be relied upon by cache implementers,
and therefore is not preferred to the original definition of Age that
is described by Option B.

.....Roy
Received on Monday, 31 March 1997 20:41:18 UTC