Re: HTTP 'HASH' Method from Jeffrey Mogul on 2007-05-25 (ietf-http-wg@w3.org from April to June 2007)

From: Jeffrey Mogul <Jeff.Mogul@hp.com>
Date: Fri, 25 May 2007 10:19:55 -0700
To: chown <elfius@gmail.com>
cc: ietf-http-wg@w3.org
Message-Id: <200705251719.l4PHJtKM023417@pobox-pa.hpl.hp.com>

    I think a 'HASH' method should be implemented into HTTP, whereby the server
    responds with a hash (md5/sha) of the requested resource. This would be a
    godsend for large networks which tend to use caching extensively, because
    caching-proxy servers could verify the source hasn't changed while creating
    a minimal amount of traffic, thereby allow the amount of time the proxy
    stores cache to be be greatly increased only at the cost of hard drive
    space.
    Not only would this benefit caching applications, but as I'm sure you could
    imagine, it could be used in may other fields, especially security.
    
As people have already pointed out, if you want to avoid redundant
cache-refills, GET+If-None-Match, together with a server that properly
implements entity tags, already solves that problem.  If you want security,
you probably shouldn't be relying on caching proxies (except *perhaps* in
carefully firewalled limited-access environments).

If what you really want is a hash (and there are indeed reasons to
want this; for example, end-to-end checks against buggy implementations
of range requests/responses), then there are already standards-track
mechanisms that support this: the existing HEAD method, and the
headers defined in RFC3230.  It's pretty hard to deploy a new HTTP
method, and it doesn't seem to be necessary (or worth the fuss) in
this case.

Adrien de Croy writes,

   the cost of calculating MD5 over a large resource could be a lot.

I recently co-authored a paper, in an unrelated context, where we did
a quickie measurement to get a rough idea of the cost: "We measured
the performance of a SHA-1 HMAC over files stored on five 500GB SATA
disks with a 2-core 2GHz Intel Xeon 5130 at 362 MB/s."  (Shah et
al. "Auditing to Keep Online Storage Services Honest", Proc. HotOS-XI.)
This is probably a higher SHA-1 rate than a Web server could support;
on the other hand, relatively few server owners could afford to pay
for anything like that kind of bandwidth, so for GET requests I
doubt it would be a real problem.

But if you want the server to respond with just the hash, and not the
body (i.e., the HEAD-based approach I outline above), then there is a
risk of a DoS attack against the server CPU and file system: a malicious
client repeatedly issues these HEAD requests (against different files
or dynamic content, thus making it impossible to cache the hash
results).  So it would probably have to be an optional feature of
the server, one it could stop doing whenever it was overloaded.
RFC3230 doesn't require to send a hash, even if the client asks
for it.

-Jeff

Received on Friday, 25 May 2007 17:20:10 UTC