Comparison of DIFF algorithms

When I was reading through the GDIFF specification,

http://www.w3.org/TR/NOTE-gdiff-19970825.html

I was reminded of the following paper:

James J. Hunt, Kiem-Phong Vo, and Walter F. Tichy, "An Empirical Study of 
Delta Algorithms," in Ian Sommerville (ed.) Proc. Software Configuration 
Management 6 Workshop (SCM-6), Berlin, Germany, March 25-26, 1996, pp. 
49-66.

This paper is available at:

http://wwwipd.ira.uka.de/~jjh/delta.ps

An updated version of this paper which has not yet been published is 
available at:

http://wwwipd.ira.uka.de/~jjh/wwcm.ps

The paper presents a comparison of the creation time and delta size for 
text and binary files, for the diff, compressed diff (gzip of diff output), 
Bdiff, and vdelta schemes.  The vdelta scheme is outlined in the paper 
which extends the Bdiff scheme, which is a modification of the block-move 
algoritm developed by Tichy.

The major departure of vdelta from GDIFF is the recognition that 
significant space savings can be achieved by creating a cache of Copy 
instructions (in the GDIFF sense of Copy).  If the same Copy instruction 
occurs repeatedly, it can be cached, and only a reference to the cache 
entry need be stored in the delta output.  The vdelta scheme was 
implemented to have linear encoding times, and extremely good compression 
ratios.  Since encoding time and good compression ratio (i.e., few bytes on 
the wire) are a concern for GDIFF (and for WebDAV), this work deserves some 
consideration.

- Jim

Received on Monday, 22 September 1997 18:12:28 UTC