- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Mon, 05 Feb 96 11:52:55 PST
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: html-wg@w3.org, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Let me try to summarize the situation as far as I can grasp it: (1) GET with large URLs can fail (in part because there is neither a specified limit, nor a specification that the URL can be of any length). (2) GET with certain character sets is a problem. (3) Some (most?) existing servers do not support request bodies for GETs (4) Caches assume that the results of a GET are "cachable" unless the GET URL contains a "?". (5) Servers could indicate explicitly whether the results of any method are cachable, using Expires: and Cache-control:. However, most servers do not currently do this. (6) Some cache implementors would like to be able to tell from the *request* whether there is any chance of finding the response in a (hierarchically higher) cache. If not, the cache could bypass the hierarchy to improve latency. It looks to me like there is a whole mess of problems tied together here, and we need to start by chipping away some of the easy ones to see what is left. (A) We really need to specify either a maximum length for URLs, or that there is no such maximum. The current situation allows for non-interoperable assumptions. (B) Except for issue #6, I see no problem with moving towards the use of POST for queries, especially those that are too large for encoding in GET URLs, or those with non-URL character sets. (C) I see no reason not to allow the origin server to mark the response to a query (either by POST or by GET "?") as "cachable", i.e., by attaching an Expires: header to it. This is a fully compatible performance enhancement. We can continue to insist that the only method cachable without explicit permission from the origin server is a GET without a "?". This leaves us with one remaining problem, which is "is there a way to mark a request so as to inform a cache whether or not to bypass hierarchically higher caches." My understanding is that the current algorithm is: if method == GET and URL does not contain "?" then go up the cache hierarchy else bypass the cache hierarchy This doesn't allow hierarchical caching for responses to non-GET methods. In particular, if people start using POST to transmit (say) Japanese-character resource names (i.e., what us English-speakers do with most GETs), the ability of such caches to do bypassing will deteriorate. So rather than proposing a GETQ method (which is really saying "this GET may have side effects that prohibit caching the response"), I think we should reexamine the proposal someone made for a "POST_WITH_NO_SIDE_EFFECTS" method. I.e., a POST whose responses are normally cachable. Perhaps POSTC is a more tractable name. >From the point of view of the (HTTP/1.1) client and server, POSTC is equivalent to POST. Only the caches treat it differently. An origin server would be able to use POSTC only with 1.1 clients and proxies, and so would have to return different HTML forms depending on the protocol version in the request header. (This would also imply using the proposed Vary: header with some token that indicates "varies based on request version, since we don't want a cache returning one of these HTML responses to an HTTP/1.0 client.) So my main concern is that even if we add this to the protocol, it probably won't get used very much. The costs accrue mostly to the server (it has to keep several versions of the HTML forms, and decide which version to return) while the benefits accrue mostly to the operators of hierarchical caches. On the other hand, adding it to the spec does not cause too much trouble, since servers aren't forced to use it, clients can treat it as the same as a POST, and caches can either do likewise or treat it as a cache-policy hint. -Jeff
Received on Monday, 5 February 1996 12:07:25 UTC