- From: Koen Holtman <koen@win.tue.nl>
- Date: Fri, 11 Aug 1995 14:19:05 +0200 (MET DST)
- To: dmk@allegra.att.com, http-wg@cuckoo.hpl.hp.com, www-talk@www10.w3.org
- Cc: koen@win.tue.nl (Koen Holtman)
As promised on www-talk, here is a session state header proposal that, among other things, tries to address some of the potential caching problems in the Session-ID (State-Info) proposal by Dave Kristol. These caching problems are in the area of graceful degradation when stateful dialogs (for which Session-ID is a support mechanism) are used with caches that do not conform to the Expires header definition in the (draft) http spec. Some people, particularly people who want to use stateful dialogs in applications where real money is involved, are very concerned about such non-conforming caches: the fear is that customers will blame them if the service does something inappropriate because of a broken cache, while it is really the fault of the cache administrator. For the record: I am not in the tele-shopping business myself. I don't know if the level of paranoia about non-conforming caches found there is justified, but I fear that it just might be. In my opinion, more information about current practice, and informed speculation about future practice, is needed to decide on this issue. The material in the appendix about the Request-ID header at the end of this message not new: I have sent it to www-talk (but not http-wg) some weeks ago. I have decided to include it here again, as there has been some discussion on merging facilities for stateful dialogs and for clicktrail statistics recently, just as I thought that this topic was dead on www-talk (the consensus being that this merging would be a bad idea because it would generate too much confusion). Koen. ----snip---- Non-persistent Cookie proposal ============================== Koen Holtman, koen@win.tue.nl This document can be seen as a personal summary of some parts of the the session-id discussion on www-talk in July 1995. [2] contains a summary of privacy issues discussed. I do not want to argue that the proposal below is the only way to go when putting support for stateful dialogs in http. Simpler mechanisms are possible, for example [3]. The `solution space' for stateful dialogs has a number of dimensions: - simplicity of implementation - time of general availability when standardized - downward compatibility - simplicity of use - reliability - amount of privacy protection - maximum complexity of stateful dialogs supported - amount of cache control possible - risks when used with non-conforming caches - amount of confusion on www-talk generated The proposal below occupies one point in this solution space. The main goal of the text below is to make all dimensions visible, not to give a personal opinion on what the best point in the space is. In fact, my personal opinion on this has changed a lot over the last month, and I have not reached a final conclusion yet. ** Background This document assumes that you have read [1] and [2], and that you are familiar with [4]. [1] the NetScape Cookie proposal <URL:http://home.mcom.com/newsref/std/cookie_spec.html>. [2] the proposals for of gathering consumer demographics <URL:http://www.w3.org/hypertext/WWW/Protocols/demographics.html> by Daniel W. Connolly. [3] Session-ID (State-Info) proposal by Dave Kristol <URL:http://www.research.att.com/~dmk/session.html> [4] The draft HTTP/1.0 specification <URL:http://www.ics.uci.edu/pub/ietf/http/> ** Terminology Most terminology in this document is as in [4]. New terminology: Browser: user agent (used for interactive web sessions). Stateful dialog: An information exchange between a web user and a web server that that extends beyond the submission of one form. In a stateful dialog, the server changes its behavior as a result of previous actions by the user. Conceptually, the state is a property of the dialog (or session), not of the browser or server. On the implementation side, either the browser, the server, or both hold the state. Cookie: a string to be sent by a browser to a server in requests, representing the state in a stateful dialog. The string is kept at the browser side, but supplied and updated by the server using set-cookie response headers. Only the server need to be able do decode the cookie. See [1] for an example cookie definition. A cookie can represent the entire dialog state directly, or be a key in a server-side dialog state database. Request-id: a unique value sent by a browser to a server in requests to allow the server to generate better clicktrail statistics. Session-id: a unique identifier sent by a browser to a server in requests to give the server a way of keeping it apart from other browsers for the purpose of engaging in stateful dialogs. Unlike a cookie, the value of a session-id can never change during a stateful dialog. Persistent information: information that is remembered by the browser for future sessions Non-persistent information: information that is lost when the browser exits Proposal: Non-persistent cookies ================================= ** 1. The set-cookie response header A server may choose to send a set-cookie response header in a response to a browser. The header looks like Set-Cookie: <string> where <string> does not contain whitespace. (This allows for future extensions). Examples: Set-Cookie: color_screen;no_jpgs;small_graphics Set-Cookie: Joe%20User;joe@foo.com Set-Cookie: id324254 Set-Cookie: joe:4234982343 If the browser receives a Set-Cookie response header, it can either honor it or ignore it. ** 2. Honoring a Set-Cookie header To honor a set-cookie header received from a server, a browser will start including a header Cookie: <string> in _direct_ requests to that server, and to that server only. For the purpose of this definition, individual servers are identified by the hostname+portnumber pair in the request URL. Directness of requests will be defined later. The cookie <string> is taken from the Set-cookie header, and may be changed by the server by sending a new Set-cookie header with a new string. A Set-cookie header with an empty string can be taken as a request to stop sending Cookie headers. ** 3. The decision to honor a Set-Cookie header If a Set-Cookie response header is honored, this means that a server can get more accurate statistics about the behavior of the browser user if the browser user is behind a firewall or proxy cache. Thus, the user should have the option of deciding not to honor Set-cookie headers for privacy reasons. It is suggested that browsers provide something like the following preferences box: +-----------------------------------------------------------------+ Honor set-cookie requests: ( ) Always honor request ( ) Start honoring requests if one is done in a response to a form submission (POST request). (*) Ask once for every site, use reply in later sessions ( ) Never honor requests +-----------------------------------------------------------------+ where the (*) is the default setting. In real browsers, the terminology used in this box should probably be adapted to make more sense to the average user. As a matter of etiquette, services should only send Set-Cookie headers if honoring them would bring some direct advantage to the user, like being able to use a stateful dialog. Sending Set-Cookie headers only to get better clicktrail statistics should be considered a breach of etiquette. To get clicktrail statistics, the Request-id header (see the appendix below or [2]) should be used. If a browser decides to honor one Set-Cookie header from a server, the service author can expect that Set-Cookie headers sent in the near future will also be honored. A browser can contain a timeout mechanism to stop honoring Set-cookie headers when, say, 30 minutes have past since the last contact with the server. ** 4. Direct and Indirect Requests A request is indirect if it 1) fetches the contents of an inline picture or other inlined object 2) resolves a 3xx (redirection) response All other requests are direct. If a browser is to honor a Set-Cookie header, Cookie headers must be sent in _direct_ requests to the server. It is preferred that Cookie headers are never sent in indirect requests to a server. The distinction between direct and indirect requests is made for two reasons: a) this allows for better caching of stateful dialogs, as discussed below b) this allows for better privacy: it makes impossible some `stealthy' cookie matching strategies that could be adopted by cooperating web service providers to allow matching clicktrails. ** 5. Cookie headers and caching: the default By default, a cache, no matter whether it is in a browser or in a proxy, must never cache responses to requests with Cookie headers in them. Responses which contain Set-Cookie headers must also never be cached, no matter whether the request itself contained a Cookie header. There are three reasons for this default 1) Responses in stateful dialogs are dynamic by nature. No big payoff can be expected from caching them. In fact, in a scheme where they can be cached, there is a danger of cache memories dropping useful `static' responses (like inline pictures from normal sites) to store relatively useless dynamic responses. 2) It is vital that responses in stateful dialogs are never cached, else services using stateful dialogs would become unreliable. An Expires: <yesterday> header in a response can be used to disable caching (in both browsers and proxy caches), but some system operators may be tempted to `tune' proxy or browser caches under their control to keep expired responses around for, say, 5 minutes, even though this makes the cache non-conformant to the http spec. Such `tuning' is relatively harmless for normal, non-interactive services, but disastrous for stateful dialog reliability. As stateful dialogs are still relatively uncommon, it is a valid assumption that many system operators are not aware of the stateful dialog risks involved with `Expires tuning', so non-conforming caches may remain with us for some time to come. By making responses in stateful dialogs using the Cookie headers a special case in the cache algorithm, independent of `Expires tuning', this particular non-conformance risk to stateful dialogs is eliminated. (Note: the Session-ID (State-Info) proposal by Dave Kristol [3] assumes that system operators will never do such `tuning': this allows proposal [3] to be very much simpler than this cookie proposal.) 3) The developer of a stateful dialog service will usually not have a cache between his browser and the CGI scripts under development. If caching were enabled by default on requests with Cookie headers, the author would have to remember to put Expires: <yesterday> headers in the responses generated by the scripts; if a forgetful or badly informed author of a stateful dialog service would not do this, the resulting unreliability would not show up on tests within the local environment. This default behavior of _not_ caching responses to requests with Cookie headers is the main reason why indirect requests, which do not get Cookie headers, were introduced. An inline picture request is indirect, so inline pictures on stateful dialog pages will get cached by default. Of course, if caching of an inline picture is not desirable, the service author can always put an Expires header in the inline picture response. If a page contains pictures that depend on the dialog state, the service author can implement these state dependent pictures by making the generation of the URLs in the <IMG SRC=...> tags depend on the state. Similar state-dependent URL generation can be done for redirection (3xx) codes. ** 6. Cookie headers and caching: overriding the default If the response to a request on URL U with a Cookie header contains an Expires: <date> header, and no Set-Cookie header, a cache can interpret this to mean two things: 1) the entity included may be cached, but not beyond the <date> given, 2) the entity in the response does *not* depend on the dialog state. Thus, if the entity is cached and some browser does a request on URL U, it is OK to serve the cached entity, no matter what the cookie header value in the request is, even no matter whether a cookie header is present at all in the request. Note that it makes no sense for servers to send both a Set-Cookie header and an Expires header in the same response. Servers may however choose to do so to get backwards compatibility with old proxy caches. Also note that it makes no sense to put Expires: <yesterday> headers in responses to requests which contain Cookie headers. Servers may however choose to do so to get backwards compatibility with old proxy caches. This way of defining Expires: semantics ensures that caches need never consider the cookie header when accessing their cache memory. The Cookie header is never part of the cache key, the header is only important when making the decision whether to cache or not. ========================== APPENDIX: Proposal: The Request-ID: header field. --------------------------------------------------- To write the text below, I took proposal I. from [2]: <URL:http://www.w3.org/hypertext/WWW/Protocols/demographics.html>, and changed things according to issues (mainly connected to privacy and caching) discussed in the Session-id threads on www-talk. The text below can be seen as a personal summary of the parts of these threads that pertain to Request-IDs and privacy. The Request-ID: header field. Adapted from the proposal in <URL:http://www.w3.org/hypertext/WWW/Protocols/demographics.html>. Am HTTP request may include a header field of the form: Request-ID: $session $request++ e.g. Request-ID: 342%33a4d443 12 The HTTP client chooses a random string as a "session identifier", and each request in that session is identified by a number that increases monotonically with time. It is suggested that clients use a different random $session string for each server they talk to. This will make it more difficult for cooperating web service providers to match clicktrails in their logfiles, thereby getting user profiling information that is much more accurate than the user would want to give them without some form of compensation. Note that it is illegal to match logfiles under the privacy laws in some countries. The suggestion to use different $session strings can be seen as supporting these laws by making the crime of matching logfiles pay off less. A "session" is not formally defined (other than "a set of requests with the same $session id"), though I suggest that browsers begin a session when they are invoked and when they have been idle for 30 minutes or more, and allow some user interface to say "start a new session" (i.e. "choose a new random session ID"). Each user agent must provide a mechanism to turn the generation of Request-Ids off, especially for site security administrators that prohibit its use. If no Request-ID headers are present, this should be interpreted by web service providers as a statement that the user does not wish to reveal his or her exact clicktrail for privacy reasons. An attempt by service providers to silently obtain the clicktrail by some other means (for example by using a session-id, cookie, or anonymous authentication mechanism that could be part of future versions of HTTP), should be considered to violate the privacy wishes of the user. Whether HTTP clients use a global $request counter, or one counter for each server talked to, is up to the clients. HTTP clients which are not traditional user agents (e.g. multi-threaded robots) may use several sessions in parallel. A proxy must pass the Request-ID: header through unmodified. One might consider some sort of Proxy-Request-ID, though I doubt it would be valuable. An HTTP cache can assume that the response to an HTTP request does _not_ vary as a function of the Request-ID. That is, an HTTP proxy need not include the Request-ID in its "cache key." If the response to a request can vary, an Expires header should be used in the response to reflect this dynamism. It is preferred that the request-ID header is _not_ used to implement stateful dialogs, in which the content of pages is different for different sessions. For stateful dialog support, other mechanisms (for example a session-id, cookie, or anonymous authentication mechanism that could be part of future versions of HTTP) should be used. Alternative proposal: Instead of introducing a new Request-ID: header, include the $session $request++ information in the From: header. Examples: From: (#342%33a4d443 12) From: "Roy T. Fielding" <fielding@beach.w3.org> (#342%33a4d443 12) ======================
Received on Friday, 11 August 1995 08:24:17 UTC