Re: don't use POST for big GET [was: Dictionaries in HTML ]

Let me try to summarize the situation as far as I can grasp it:

(1) GET with large URLs can fail (in part because there is neither
a specified limit, nor a specification that the URL can be of any
length).

(2) GET with certain character sets is a problem.

(3) Some (most?) existing servers do not support request bodies
for GETs

(4) Caches assume that the results of a GET are "cachable" unless
the GET URL contains a "?".

(5) Servers could indicate explicitly whether the results of any
method are cachable, using Expires: and Cache-control:.  However,
most servers do not currently do this.

(6) Some cache implementors would like to be able to tell from
the *request* whether there is any chance of finding the response
in a (hierarchically higher) cache.  If not, the cache could bypass
the hierarchy to improve latency.

It looks to me like there is a whole mess of problems tied together
here, and we need to start by chipping away some of the easy ones
to see what is left.

(A) We really need to specify either a maximum length for URLs,
or that there is no such maximum.  The current situation allows
for non-interoperable assumptions.

(B) Except for issue #6, I see no problem with moving towards
the use of POST for queries, especially those that are too large
for encoding in GET URLs, or those with non-URL character sets.

(C) I see no reason not to allow the origin server to mark the
response to a query (either by POST or by GET "?") as "cachable",
i.e., by attaching an Expires: header to it.  This is a fully
compatible performance enhancement.  We can continue to insist
that the only method cachable without explicit permission from
the origin server is a GET without a "?".

This leaves us with one remaining problem, which is "is there
a way to mark a request so as to inform a cache whether or not
to bypass hierarchically higher caches."  My understanding is
that the current algorithm is:
	if method == GET and URL does not contain "?" then
		go up the cache hierarchy
	else
		bypass the cache hierarchy

This doesn't allow hierarchical caching for responses to
non-GET methods.  In particular, if people start using
POST to transmit (say) Japanese-character resource names
(i.e., what us English-speakers do with most GETs), the
ability of such caches to do bypassing will deteriorate.

So rather than proposing a GETQ method (which is really
saying "this GET may have side effects that prohibit caching
the response"), I think we should reexamine the proposal
someone made for a "POST_WITH_NO_SIDE_EFFECTS" method.
I.e., a POST whose responses are normally cachable.  Perhaps
POSTC is a more tractable name.

>From the point of view of the (HTTP/1.1) client and server, POSTC
is equivalent to POST.  Only the caches treat it differently.

An origin server would be able to use POSTC only with 1.1
clients and proxies, and so would have to return different
HTML forms depending on the protocol version in the request
header.  (This would also imply using the proposed Vary:
header with some token that indicates "varies based on request
version, since we don't want a cache returning one of these
HTML responses to an HTTP/1.0 client.)

So my main concern is that even if we add this to the protocol,
it probably won't get used very much.  The costs accrue mostly
to the server (it has to keep several versions of the HTML forms,
and decide which version to return) while the benefits accrue
mostly to the operators of hierarchical caches.  On the other
hand, adding it to the spec does not cause too much trouble,
since servers aren't forced to use it, clients can treat it
as the same as a POST, and caches can either do likewise or
treat it as a cache-policy hint.

-Jeff

Received on Monday, 5 February 1996 12:07:25 UTC