- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Mon, 08 Jan 96 12:46:48 PST
- To: "Roy T. Fielding" <fielding@avron.ICS.UCI.EDU>
- Cc: http-caching@pa.dec.com
The Web is designed to be used by *and created by* people. People are a necessary part of the evolving system which is the Web. The Web is a type of system commonly referred to by the term "groupware". There are certain design and implementation factors which are known to cause groupware systems (or projects creating such systems) to fail. Although not strictly WWW-related, the following paper: Grudin, Jonathan. "Groupware and Social Dynamics: Eight Challenges for Developers". Communications of the ACM, 37(1), January 1994. is instructive and presents more background cases. Web protocol designers need to be aware of these factors and design with them in mind. I know that TimBL is aware of them, and that I am, but for too long I have also been assuming that the HTTP WG members are aware of them as well [or, at the very least, would understand why I was making design decisions based upon those factors]. On your suggestion, I read that article (at least, I read a large part of it, but skimmed some sections). It's a good article, but I think it is not entirely relevant to our own efforts, for two reasons: (1) The article primarily addresses the question of *adoption of groupware systems*, and points out that many such systems were never successfully adopted. (2) The article is primarily about the user's view of such systems, including their semantics and perhaps performance, but not about the details of how they are implemented. Thanks in large part to the vision and creativity of the pioneers, and to the simplicity of the original HTTP protocol, the Web has certainly been "successfully adopted." I think we are now in a different phase, one which some people call a "success disaster". The problem is not how to get people to use the Web, the problem is how to engineer the Web to continue to meet the needs and demands of its users. If the Web ends up as a failed system, it will not be because of barriers to adoption, but rather (in some sense) the converse: too many users for a design that didn't successfully scale. Regarding the different between the user's view and the implementation: arguments about what to optimize *in the HTTP protocol* are of course related to what the users see, but the users see these protocol features only indirectly. I.e., does the protocol support fast, efficient, reliable, and intuitive use of the Web? To argue that a *protocol* must be simple in order to meet these goals would be like arguing that CPU instruction sets must be simple, or that the code of a web browser must be simple. When I think about the browser code I've seen, and the CPU implementations that have been done in my group, it probably takes more complexity at these lower levels to make the upper level seem "simple", not less. I am NOT arguing that the HTTP protocol *should* be complex. It is clear that the simple initial design led to widespread implementation, and I am sure that too much complexity in a revised design would lead to no implementations. I am arguing that some added complexity in the protocol is necessary if we are to address the scaling issues and some of the user-interface issues that have come up. Finally, at least one person has said that the current caching algorithm is "broken". However, I have yet to see any examples of brokenness illustrated with the current HTTP/1.1 caching mechanisms. I have also yet to see an example of cache usage/requirements that has not already been met by the HTTP/1.1 design. Since the purpose of the caching subgroup was to do that BEFORE redesigning the protocol, I am not at all happy about the current direction of the subgroup. You are right that we (those of us arguing for a new cache design) need to make our complaints more explicit. I'll try to point out a few, but I'm reasonably sure that other people will have contributions, and I hope they will make them (politely, of course). At the same time, I want to stress that I don't think that your proposed caching design is completely "broken"; I think it is wrong in some of the details, and (as I've struggled to write up my own proposal) it's become clear to me that we haven't quite developed the right language for discussing or precisely describing any caching design. Among the eight challenges that Grudin lists, one is described (box on p. 97) as "failure of intuition" (elsewhere he uses a somewhat different phrase). He seems to mean this in a variety of ways, but in particular he means that a user interface should not give nasty surprises. (Dave Mills calls this "The Principle of Least Astonishment".) Long-time readers of comp.risks may remember that one of the common phrases to find at the end of the flight voice recorder tape from an airplane crash is "why did it do THAT?", especially with highly-automated cockpits. I believe that the current HTTP caching design quite often leads to failures of intuition, manifested as the frequent need to hit the "Reload" button. Caching ought to be transparent to users, except for rare occasions, and except (of course) for performance. But instead, we are required to maintain a mental model of what things might be cached, and when this might lead to stale displays, and how we can work around this. For someone with a degree in computer science, this is merely an annoyance, but have you tried explaining this to a lawyer or an artist? Now, it may be impossible to maximize cache effectiveness (hit rate) and transparency at once, but I think the current design leaves something to be desired. In particular, when I want to get a new view of a page with some changing images and some immutable ones (e.g., http://www.unitedmedia.com/comics/dilbert/) I'm pretty much forced to reload all of the images, not just the ones that change often. One could probably make this almost work right using GET I-M-S and Expires:, but part of the problem (I think) is that there is currently no explicit definition of what caches are actually supposed to do with this. (E.g., if the client does a GET I-M-S and the cache believes its stored modification date, does it need to do a GET I-M-S to the server?) It may be that most of problems have to do not with the protocol elements, but with a lack of explicit description about algorithms. I think it's also clear that although we don't quite understand how to combine caching and content negotiation, it certainly isn't covered in the draft 1.1 spec. This could also lead to intuition failures, if we don't get it right. Anyway, my hope is that we can arrive at a consensus on a protocol that provides both transparency and performance, and in a way that does not change the basic principles of HTTP (but may require changing some details). -Jeff
Received on Monday, 8 January 1996 20:54:45 UTC