- From: Jim Whitehead <ejw@ics.uci.edu>
- Date: Tue, 23 Sep 1997 03:06:22 -0700
- To: "'Arthur van Hoff'" <avh@marimba.com>
- Cc: Push Workshop <www-push@w3.org>, DRP Mailing List <drp@marimba.com>, "'w3c-dist-auth@w3.org'" <w3c-dist-auth@w3.org>
When I was reading through the GDIFF specification, http://www.w3.org/TR/NOTE-gdiff-19970825.html I was reminded of the following paper: James J. Hunt, Kiem-Phong Vo, and Walter F. Tichy, "An Empirical Study of Delta Algorithms," in Ian Sommerville (ed.) Proc. Software Configuration Management 6 Workshop (SCM-6), Berlin, Germany, March 25-26, 1996, pp. 49-66. This paper is available at: http://wwwipd.ira.uka.de/~jjh/delta.ps An updated version of this paper which has not yet been published is available at: http://wwwipd.ira.uka.de/~jjh/wwcm.ps The paper presents a comparison of the creation time and delta size for text and binary files, for the diff, compressed diff (gzip of diff output), Bdiff, and vdelta schemes. The vdelta scheme is outlined in the paper which extends the Bdiff scheme, which is a modification of the block-move algoritm developed by Tichy. The major departure of vdelta from GDIFF is the recognition that significant space savings can be achieved by creating a cache of Copy instructions (in the GDIFF sense of Copy). If the same Copy instruction occurs repeatedly, it can be cached, and only a reference to the cache entry need be stored in the delta output. The vdelta scheme was implemented to have linear encoding times, and extremely good compression ratios. Since encoding time and good compression ratio (i.e., few bytes on the wire) are a concern for GDIFF (and for WebDAV), this work deserves some consideration. - Jim
Received on Monday, 22 September 1997 18:12:28 UTC