- From: Jeffrey Mogul <Jeff.Mogul@hp.com>
- Date: Fri, 25 May 2007 10:19:55 -0700
- To: chown <elfius@gmail.com>
- cc: ietf-http-wg@w3.org
I think a 'HASH' method should be implemented into HTTP, whereby the server responds with a hash (md5/sha) of the requested resource. This would be a godsend for large networks which tend to use caching extensively, because caching-proxy servers could verify the source hasn't changed while creating a minimal amount of traffic, thereby allow the amount of time the proxy stores cache to be be greatly increased only at the cost of hard drive space. Not only would this benefit caching applications, but as I'm sure you could imagine, it could be used in may other fields, especially security. As people have already pointed out, if you want to avoid redundant cache-refills, GET+If-None-Match, together with a server that properly implements entity tags, already solves that problem. If you want security, you probably shouldn't be relying on caching proxies (except *perhaps* in carefully firewalled limited-access environments). If what you really want is a hash (and there are indeed reasons to want this; for example, end-to-end checks against buggy implementations of range requests/responses), then there are already standards-track mechanisms that support this: the existing HEAD method, and the headers defined in RFC3230. It's pretty hard to deploy a new HTTP method, and it doesn't seem to be necessary (or worth the fuss) in this case. Adrien de Croy writes, the cost of calculating MD5 over a large resource could be a lot. I recently co-authored a paper, in an unrelated context, where we did a quickie measurement to get a rough idea of the cost: "We measured the performance of a SHA-1 HMAC over files stored on five 500GB SATA disks with a 2-core 2GHz Intel Xeon 5130 at 362 MB/s." (Shah et al. "Auditing to Keep Online Storage Services Honest", Proc. HotOS-XI.) This is probably a higher SHA-1 rate than a Web server could support; on the other hand, relatively few server owners could afford to pay for anything like that kind of bandwidth, so for GET requests I doubt it would be a real problem. But if you want the server to respond with just the hash, and not the body (i.e., the HEAD-based approach I outline above), then there is a risk of a DoS attack against the server CPU and file system: a malicious client repeatedly issues these HEAD requests (against different files or dynamic content, thus making it impossible to cache the hash results). So it would probably have to be an optional feature of the server, one it could stop doing whenever it was overloaded. RFC3230 doesn't require to send a hash, even if the client asks for it. -Jeff
Received on Friday, 25 May 2007 17:20:10 UTC