- From: Erik Aronesty <earonesty@montgomery.com>
- Date: Mon, 5 Aug 1996 17:29:58 -0700
- To: "'http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com'" <http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com>
Server-side document hashing can be a viable way to reduce traffic....but only for documents that are commonly copied...or for those documents that are known to be "mirrors" of other documents. It is certainly within the scope of the specification of HTTP, an "object-oriented protocol", to add a facility for identification of these objects. A suitable proposal would allow one or more content-identification headers to be reported by the server *1. the identifier *2. the method of identification (secure MD5 hash, registration authority, etc.) 3. the scope of the content-identifier (eg: *.com) Also a "content-origin" header would prove useful for indexing facilities and intelligent browsers making use of content-identification for space/time efficiency. >---------- >From: Martin Hamilton[SMTP:martin@mrrl.lut.ac.uk] >Sent: Saturday, August 03, 1996 1:22 PM >I got curious about this a little while back, and wrote a little Perl >program to calculate MD5 checksums of the objects in our >(local/regional ?) cache, so we could see how many were dups. The >results weren't very encouraging... ><URL:http://www.roads.lut.ac.uk/lists/ircache/0202.html> A significant number of hits for certain documents could have been reduced if the your proxy had reported a document-hash to the client in the header. ------------------------------------------------------ Anyone who has experience with commercial document handling services knows that object identification is critical to the efficient functioning of his server (eg: Lotus Notes, Microsoft Exchange). >
Received on Monday, 5 August 1996 17:34:05 UTC