On transparency from Jeffrey Mogul on 1996-02-20 (http-caching-historical@w3.org from February 1996)

From: Jeffrey Mogul <mogul@pa.dec.com>
Date: Tue, 20 Feb 96 14:55:17 PST
To: "Roy T. Fielding" <fielding@avron.ICS.UCI.EDU>
Cc: http-caching@pa.dec.com
Message-Id: <9602202255.AA09730@acetes.pa.dec.com>
Regarding my summary of the Feb 2 1996 meeting, in which I wrote:
> Issue: transparency vs. performance
> 
> Since there have been numerous discussions of whether semantic
> transparency or performance is the more important issue for HTTP
> caching, we tried to come to a consensus on what we believed about
> this.
> 
> Here is a rough summary of our consensus:
> 
> 	Applications in which HTTP is used span a wide space
> 	of interaction styles.  For some of those applications,
> 	the origin server needs to impose strict controls on
> 	when and where values are cached, or else the application
> 	simply fails to work properly.  We referred to these
> 	as the "corner cases".  In (perhaps) most other cases,
> 	on the other hand, caching does not interfere with the
> 	application semantics.  We call this the "common case".
> 	
> 	Caching in HTTP should provide the best possible
> 	performance in the common case, but the HTTP protocol MUST
> 	entirely support the semantics of the corner cases, and in
> 	particular an origin server MUST be able to defeat caching
> 	in such a way that any attempt to override this decision
> 	cannot be made without an explicit understanding that in
> 	doing so the proxy or client is going to suffer from
> 	incorrect behavior.  In other words, if the origin server
> 	says "do not cache" and you decide to cache anyway, you
> 	have to do the equivalent of signing a waiver form.
> 
> 	We explicitly reject an approach in which the protocol
> 	is designed to maximize performance for the common case
> 	by making the corner cases fail to work correctly.

Roy writes:
    Let me again say that I adamantly oppose this decision.  It doesn't
    reflect any of the applications that currently use HTTP, it is a
    mythical invention of the subgroup that such a thing is even
    desirable in all cases, and does a poor job of satisfying the
    user's needs.

    The reason that user agents are not always semantically transparent
    is because the user does not always want them to be semantically
    transparent.  No matter what is in the protocol, no decision by the
    WG will ever change this fact of life.  It is therefore WRONG to
    require in the protocol what cannot be achieved by any application
    -- all you are doing is requiring applications to be
    non-compliant.

    What you want is to enable the protocol to say "this is what you
    have to do to remain semantically transparent" and then require
    that applications default to semantic transparency mode.  The
    former is what Cache-control does, and the latter can be added to
    the text.

    What we cannot do is control the user's application of HTTP
    technology; attempting to do so is foolish and contrary to the
    design of the Web.  Requiring a visible/noticeable warning be
    presented when semantic transparency is disabled is reasonable,
    provided that it does not actively interfere with people's work.

I'm in a tricky position here, since I am both the moderator of
this subgroup (and hence nominally responsible for obtaining
consensus), and also the primary proponent of the position that
Roy so adamantly opposes.  This is a no-win situation, because
I've failed to change Roy's mind, he has failed to change mine,
and there are explicit protocol specification decisions that
apparently depend on resolving this contradiction.

Therefore, this is something that we need to discuss at the
IETF meeting in Los Angeles (Larry, are you listening?).

Further, anyone who agrees with Roy on this issue ought to
step up NOW and support his position.  So far, by not
disagreeing with my summary, the people who were at the meeting
have implicitly approved it.  Much as I would hate to lose
this argument, it would be even worse if I won it because
the rest of you were too terrified of contradicting me. :-)

When Roy last raised this issue, I sent him a private response,
which I think is worth forwarding to the subgroup, and so it
follows below.

-Jeff
---------------------------------------------------------------

  > Jeff wrote:
  >        The proposed design uses opaque cache validators and
  >        explicit expiration values to allow the server to control
  >        the tradeoff between cache performance and staleness of the
  >        data presented to users.  The server may choose to ensure
  >        that a user never unwittingly sees stale data, or to
  >        minimize network traffic, or to compromise between these
  >        two extremes.  The proposed design also allows the server
  >        to control whether a client sees stale data after another
  >        client performs an update.
  
  Roy wrote:
  This is an incorrect design for HTTP caching.  The cache does not exist
  on behalf of the origin server, and therefore any requirements placed
  by the origin server will always be secondary to those of the user.
  
  > Jeff wrote:
  >   Server-based control is also important because HTTP may be used for a
  >   wide variety of ``applications.''  The design of a Web application
  >   (for example, a stock-trading system) may be peculiar to the server,
  >   while Web browsers are generic to all Web applications.  Because the
  >   precise behavior of an application cannot be known to the implementor
  >   of a browser, but can be controlled by the implementor of a server,
  >   servers need to have the option of direct control over the caching
  >   mechanism.  Because the world is not perfect, we also need to give
  >   users and browsers some control over caching, but this is at best a
  >   contingency plan.
  
  Roy wrote:
  This is an incorrect assumption.  The server is not capable of knowing
  the needs of the user, and it is the needs of the user that take precedence
  in the design of the WWW -- any other ordering results in systems that
  purposely defy the design in order to satisfy the user's needs.
  Therefore, the caching model MUST be defined according to the user's needs
  and only allow the server to provide input into the decisions made to
  satisfy those needs.  This allow's the user to decide what is and is not
  correct behavior.

This is the main conceptual disagreement between us, and a number of
your other complaints derive from this.  I'll start by pointing out
that you are putting words into my mouth that I never wrote: of course
the cache does not exist "on behalf of" the origin server, nor does
it necessarily exist on behalf of the ultimate user.

Caches exist to improve performance, and it's not zero-sum game.
Users, servers, and intermediaries (such as Netcom or similar) can all
benefit from caching, if it is done right.

However (and this is the point where you are manifestly wrong), caches
do not exist independently of semantics.  Otherwise, I could write
a cache that returns, say, a Dilbert cartoon, no matter what URL
was requested.  That's obviously an extreme breakdown in semantics,
but to say that the "user's needs" define the semantics of an HTTP
interaction is so ill-defined as to be entirely useless.

Users DO have needs for things such as performance, availability,
clarity of the UI, etc.  But these are entirely orthogonal to whether
the semantics of a request-response interaction are those intended
by the origin server or not.  In this respect, the user's primary
"need" is that when he or she makes a request, the response has some
semantically appropriate meaning.

If the Web were simply composed of static (or slowly changing) documents,
then the semantics of HTTP interactions would be trivial and one could
easily let the user decide exactly what to do.  But this is manifestly
not the only thing the Web is used for, and probably no longer even the
most prevalent.  At the meeting on Feb. 2, for example, Shel Kaphan made
it quite clear that the worst problem he faced in implementing his
book-ordering service was the plethora of user-agent and cache
implementations that blithely assumed they could decide when and
when not to use a cached copy of some response.

Simply put, the origin server MUST be able to control the semantics
that the user sees, or else many obviously useful services cannot
be implemented.  What service authors are doing today is to go
through extensive contortions to defeat caching, since they cannot
trust the caches to get the semantics right.  Only if we fix the
HTTP protocol to give the origin servers the necessary level of
control are we going to be able to get the full benefits of caching.

As far as I can tell, everyone at the meeting understood and agreed
on this point, and I have no evidence that anyone else in the caching
subgroup disagrees.  [Note added Feb. 20: not including Roy.]

Of course, in a real-world system we cannot insist on full semantic
transparency 100% of the time.  So who gets to control what happens?
Larry Masinter (in a private message to me) phrased the question as

    When Superman meets his evil twin, who can win, since they're both
    Equally Strong? When the Unstoppable Force meets the Immovable
    Object, who will win? These philosophical questions are pretty hard to
    answer in the abstract.

The only resolution of this question is to sidestep it, and recognize
that neither side can "win" at the expense of the other.  Rather, the
HTTP protocol should ensure that neither side loses when it comes
down to preserving semantics.

In other words, a cache does not relax the requirement of semantic
transparency unless BOTH the origin server and the user agree to it.
But because the ultimate semantics derive from the origin server,
and not from the browser, the situation cannot be symmetrical.  Only
the origin server knows where "transparency" actually begins and ends,
and so the origin server can be allowed to specify the "freshness
lifetime" without input from the user.

In other words, you have it 100% backwards when you say
  [The] caching model MUST be defined according to the user's needs
  and only allow the server to provide input into the decisions made to
  satisfy those needs.  This allow's the user to decide what is and is not
  correct behavior.

Rather, the caching model MUST be defined according to the semantics
of the service, and only allow the user to provide input about how
far to relax those semantics.  This allows the origin server to decide
what is and is not correct behavior.

I defy you to explain, for example, how Shel Kaphan can make his
book-ordering server work in your user-wins model.
Received on Tuesday, 20 February 1996 23:38:18 UTC