these results sound very encouraging

Server-side document hashing can be a viable way to reduce 
traffic....but only for documents that are commonly copied...or 
for those documents that are known to be "mirrors" of other 
documents.  It is certainly within the scope of the  
specification of HTTP, an "object-oriented protocol", to add a facility
for 
identification of these objects.

A suitable proposal would allow one or more 
content-identification headers to be reported by the server

*1. the identifier
*2. the method of identification (secure MD5 hash, 
	registration authority, etc.)
 3. the scope of the content-identifier (eg: *.com)

Also a "content-origin" header would prove useful for 
indexing facilities and intelligent browsers making use
of content-identification for space/time efficiency.

>----------
>From: 	Martin Hamilton[SMTP:martin@mrrl.lut.ac.uk]
>Sent: 	Saturday, August 03, 1996 1:22 PM
>I got curious about this a little while back, and wrote a little Perl 
>program to calculate MD5 checksums of the objects in our 
>(local/regional ?) cache, so we could see how many were dups.  The 
>results weren't very encouraging...  

><URL:http://www.roads.lut.ac.uk/lists/ircache/0202.html>

A significant number of hits for certain documents 
could have been reduced if the your proxy had reported a 
document-hash to the client in the header.

------------------------------------------------------

Anyone who has experience with commercial document
handling services knows that object identification is 
critical to the efficient functioning of his server (eg: 
Lotus Notes, Microsoft Exchange).

>

Received on Monday, 5 August 1996 17:34:05 UTC