RE: whenToUseGet-7 counter-proposal from Joshua Allen on 2002-04-24 (www-tag@w3.org from April 2002)

From: Joshua Allen <joshuaa@microsoft.com>
Date: Tue, 23 Apr 2002 22:28:00 -0700
To: "Mark Nottingham" <mnot@mnot.net>
Cc: <www-tag@w3.org>
Message-ID: <4F4182C71C1FDD4BA0937A7EB7B8B4C104F058C5@red-msg-08.redmond.corp.microsoft.com>

> Regarding your proposed language, if systems cannot rely on HTTP GET
> being safe, how will caching and crawling work at all?

Most only cache and crawl URIs that don't have a querystring.  You
answered it yourself by saying "just don't submit forms".  If a form
element says METHOD=GET, the parameters are going to be embedded in the
querystring.  As a number of others have pointed out, the difference
between METHOD=GET and METHOD=POST is irrelevant to most modern web
server programming platforms (ASP, PHP, Coldfusion, JSP, Servlets etc.)
When a developer decides to use a GET instead of a POST in his form, he
has no idea that it should be idempotent.  In retrospect, it probably
would have been smart for the tools to be designed to make this
distinction clearer.

This is just the way things are today; caching and crawling do not trust
POST, and they do not trust querystrings.  Both are assumed to have
potential side-effects.  It is possible that some edge caches will try
to cache responses from URIs with querystrings, and maybe my experience
with this is negative simply because pages that are dynamically created
through server code and form fields (as any with METHOD=GET) typically
set the cache control headers to no cache.  In fact, I once saw a
situation with a prominent (non-Microsoft) ISP who was accidentally
exposing customers' credit card numbers to one another because they were
incorrectly caching dynamic content by *ignoring* the cache control
headers.  This was a fairly arcane bug, but you can bet credit cards
would have been more widely compromised if this ISP had blindly cached
any URI (they definitely did *not* cache URIs with querystrings unless
the cache-control headers permitted it).  

And any crawlers I have used are deliberately designed to ignore URIs
with querystrings.

In fact, the only objection to this heuristic that I can think of would
be that "If it is meant to be posted via a FORM, it should never exist
as a raw URI that a crawler would encounter".  But this doesn't help
edge caches any, and one of the reasons that GET proponents always give
for their endearment to this verb is that "a FORM posted through GET can
be bookmarked".  True, people don't bookmark the page that transfers
money from their account, and if they clicked on it they would have
difficulty blaming someone else.  But search engines in particular are
loathe to assume the liability that could arise from randomly submitting
forms on behalf of people.  

Therefore, I think it is folly to trust POST or GET+querystrings.  Few
people *do* trust these, and always with a keen sense of the
implications.  For example, as an edge cache I might go ahead and take
the risk that I might be passively caching someone's credit card number,
but as a crawler I would be aware that my following a URI with
querystring could very well *cause* something to happen and carry a
significantly higher risk.

Received on Wednesday, 24 April 2002 02:10:23 UTC