Re: PATCH, gdiff, and random-access I/O from Jamie Lokier on 2004-04-30 (ietf-http-wg@w3.org from April to June 2004)

From: Jamie Lokier <jamie@shareable.org>
Date: Fri, 30 Apr 2004 09:22:42 +0100
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Justin Chapweske <justin@chapweske.com>, Alex Rousskov <rousskov@measurement-factory.com>, HTTP working group <ietf-http-wg@w3.org>
Message-ID: <20040430082242.GA13899@mail.shareable.org>

Julian Reschke wrote:
> SEEK 1234<lf>
> WRITE 5<lf>
> abcde<lf>
> TRUNCATE<lf>
> 
> would seek by 1234 bytes, write the bytes stream "abcde" (format is 
> binary), then truncate at this position.

The gdiff format expressed these operations fairly straightforwardly.

Gdiff is just a sequence of "COPY n bytes from position m of original"
and "insert n DATA bytes b0,b1,...bn-1".

A random access write of n bytes to position p is expressed simply in
gdiff as:

    COPY 0,p
    DATA n,b0,b1,...,bn-1
    COPY p+n,length-p-n

The only difficulty for a filesystem driver is that it needs to know
the length before sending the PATCH command.

If the driver and server are using Etags anyway, to ensure the PATCH
applies to known content, then the driver will know the length.  Same
if it's using WebDAV-style locking.

However, Etags are often based on things like the MD5 of the file's
contents, which we don't want the server to have to recompute each time.

We should bear in mind that if we're trying to design a _good_
read-write file server protocol, a bit more thought needs to go into
the cacheing model.

-- Jamie

Received on Friday, 30 April 2004 04:25:55 UTC