- From: Joshua Allen <joshuaa@microsoft.com>
- Date: Tue, 23 Apr 2002 22:28:00 -0700
- To: "Mark Nottingham" <mnot@mnot.net>
- Cc: <www-tag@w3.org>
> Regarding your proposed language, if systems cannot rely on HTTP GET > being safe, how will caching and crawling work at all? Most only cache and crawl URIs that don't have a querystring. You answered it yourself by saying "just don't submit forms". If a form element says METHOD=GET, the parameters are going to be embedded in the querystring. As a number of others have pointed out, the difference between METHOD=GET and METHOD=POST is irrelevant to most modern web server programming platforms (ASP, PHP, Coldfusion, JSP, Servlets etc.) When a developer decides to use a GET instead of a POST in his form, he has no idea that it should be idempotent. In retrospect, it probably would have been smart for the tools to be designed to make this distinction clearer. This is just the way things are today; caching and crawling do not trust POST, and they do not trust querystrings. Both are assumed to have potential side-effects. It is possible that some edge caches will try to cache responses from URIs with querystrings, and maybe my experience with this is negative simply because pages that are dynamically created through server code and form fields (as any with METHOD=GET) typically set the cache control headers to no cache. In fact, I once saw a situation with a prominent (non-Microsoft) ISP who was accidentally exposing customers' credit card numbers to one another because they were incorrectly caching dynamic content by *ignoring* the cache control headers. This was a fairly arcane bug, but you can bet credit cards would have been more widely compromised if this ISP had blindly cached any URI (they definitely did *not* cache URIs with querystrings unless the cache-control headers permitted it). And any crawlers I have used are deliberately designed to ignore URIs with querystrings. In fact, the only objection to this heuristic that I can think of would be that "If it is meant to be posted via a FORM, it should never exist as a raw URI that a crawler would encounter". But this doesn't help edge caches any, and one of the reasons that GET proponents always give for their endearment to this verb is that "a FORM posted through GET can be bookmarked". True, people don't bookmark the page that transfers money from their account, and if they clicked on it they would have difficulty blaming someone else. But search engines in particular are loathe to assume the liability that could arise from randomly submitting forms on behalf of people. Therefore, I think it is folly to trust POST or GET+querystrings. Few people *do* trust these, and always with a keen sense of the implications. For example, as an edge cache I might go ahead and take the risk that I might be passively caching someone's credit card number, but as a crawler I would be aware that my following a URI with querystring could very well *cause* something to happen and carry a significantly higher risk.
Received on Wednesday, 24 April 2002 02:10:23 UTC