"Groupware and Social Dynamics" from Jeffrey Mogul on 1996-01-08 (http-caching-historical@w3.org from January 1996)

From: Jeffrey Mogul <mogul@pa.dec.com>
Date: Mon, 08 Jan 96 12:46:48 PST
To: "Roy T. Fielding" <fielding@avron.ICS.UCI.EDU>
Cc: http-caching@pa.dec.com
Message-Id: <9601082046.AA24539@acetes.pa.dec.com>
    The Web is designed to be used by *and created by* people.  People are a
    necessary part of the evolving system which is the Web.  The Web is
    a type of system commonly referred to by the term "groupware".  There are
    certain design and implementation factors which are known to cause
    groupware systems (or projects creating such systems) to fail.  Although
    not strictly WWW-related, the following paper:
    
	Grudin, Jonathan. "Groupware and Social Dynamics: Eight Challenges
	   for Developers". Communications of the ACM, 37(1), January 1994.
    
    is instructive and presents more background cases.  Web protocol
    designers need to be aware of these factors and design with them in mind.
    I know that TimBL is aware of them, and that I am, but for too long I have
    also been assuming that the HTTP WG members are aware of them as well
    [or, at the very least, would understand why I was making design decisions
    based upon those factors].

On your suggestion, I read that article (at least, I read a large
part of it, but skimmed some sections).  It's a good article, but
I think it is not entirely relevant to our own efforts, for two
reasons:

	(1) The article primarily addresses the question of
	*adoption of groupware systems*, and points out that
	many such systems were never successfully adopted.
	
	(2) The article is primarily about the user's view
	of such systems, including their semantics and perhaps
	performance, but not about the details of how they are
	implemented.

Thanks in large part to the vision and creativity of the pioneers, and
to the simplicity of the original HTTP protocol, the Web has certainly
been "successfully adopted."  I think we are now in a different phase,
one which some people call a "success disaster".  The problem is not
how to get people to use the Web, the problem is how to engineer the
Web to continue to meet the needs and demands of its users.  If the Web
ends up as a failed system, it will not be because of barriers to
adoption, but rather (in some sense) the converse: too many users for a
design that didn't successfully scale.

Regarding the different between the user's view and the
implementation:  arguments about what to optimize *in the HTTP
protocol* are of course related to what the users see, but the users
see these protocol features only indirectly.  I.e., does the protocol
support fast, efficient, reliable, and intuitive use of the Web?  To
argue that a *protocol* must be simple in order to meet these goals
would be like arguing that CPU instruction sets must be simple, or that
the code of a web browser must be simple.  When I think about the
browser code I've seen, and the CPU implementations that have been done
in my group, it probably takes more complexity at these lower levels to
make the upper level seem "simple", not less.

I am NOT arguing that the HTTP protocol *should* be complex.  It is
clear that the simple initial design led to widespread implementation,
and I am sure that too much complexity in a revised design would lead
to no implementations.  I am arguing that some added complexity in the
protocol is necessary if we are to address the scaling issues and some
of the user-interface issues that have come up.

    Finally, at least one person has said that the current caching
    algorithm is "broken".  However, I have yet to see any examples of
    brokenness illustrated with the current HTTP/1.1 caching
    mechanisms.  I have also yet to see an example of cache
    usage/requirements that has not already been met by the HTTP/1.1
    design.  Since the purpose of the caching subgroup was to do that
    BEFORE redesigning the protocol, I am not at all happy about the
    current direction of the subgroup.

You are right that we (those of us arguing for a new cache design) need
to make our complaints more explicit.  I'll try to point out a few, but
I'm reasonably sure that other people will have contributions, and I
hope they will make them (politely, of course).  At the same time,
I want to stress that I don't think that your proposed caching design
is completely "broken"; I think it is wrong in some of the details,
and (as I've struggled to write up my own proposal) it's become clear
to me that we haven't quite developed the right language for
discussing or precisely describing any caching design.

Among the eight challenges that Grudin lists, one is described (box on
p. 97) as "failure of intuition" (elsewhere he uses a somewhat
different phrase).  He seems to mean this in a variety of ways, but in
particular he means that a user interface should not give nasty
surprises.  (Dave Mills calls this "The Principle of Least
Astonishment".)  Long-time readers of comp.risks may remember that one
of the common phrases to find at the end of the flight voice recorder
tape from an airplane crash is "why did it do THAT?", especially with
highly-automated cockpits.

I believe that the current HTTP caching design quite often leads to
failures of intuition, manifested as the frequent need to hit the
"Reload" button.  Caching ought to be transparent to users, except
for rare occasions, and except (of course) for performance.  But
instead, we are required to maintain a mental model of what things
might be cached, and when this might lead to stale displays, and
how we can work around this.  For someone with a degree in computer
science, this is merely an annoyance, but have you tried explaining
this to a lawyer or an artist?

Now, it may be impossible to maximize cache effectiveness (hit rate)
and transparency at once, but I think the current design leaves
something to be desired.  In particular, when I want to get a new view
of a page with some changing images and some immutable ones (e.g.,
http://www.unitedmedia.com/comics/dilbert/) I'm pretty much forced to
reload all of the images, not just the ones that change often.  One
could probably make this almost work right using GET I-M-S and
Expires:, but part of the problem (I think) is that there is currently
no explicit definition of what caches are actually supposed to do with
this.  (E.g., if the client does a GET I-M-S and the cache believes its
stored modification date, does it need to do a GET I-M-S to the
server?)  It may be that most of problems have to do not with
the protocol elements, but with a lack of explicit description about
algorithms.

I think it's also clear that although we don't quite understand
how to combine caching and content negotiation, it certainly isn't
covered in the draft 1.1 spec.  This could also lead to intuition
failures, if we don't get it right.

Anyway, my hope is that we can arrive at a consensus on a protocol
that provides both transparency and performance, and in a way that
does not change the basic principles of HTTP (but may require changing
some details).

-Jeff
Received on Monday, 8 January 1996 20:54:45 UTC