Re: PATCH, gdiff, and random-access I/O from Jeffrey Mogul on 2004-04-30 (ietf-http-wg@w3.org from April to June 2004)

From: Jeffrey Mogul <Jeff.Mogul@hp.com>
Date: Fri, 30 Apr 2004 15:31:49 -0700
To: Alex Rousskov <rousskov@measurement-factory.com>
Cc: Justin Chapweske <justin@chapweske.com>, HTTP working group <ietf-http-wg@w3.org>
Message-Id: <200404302231.i3UMVnP1021977@wera.hpl.hp.com>

 From: Alex Rousskov <rousskov@measurement-factory.com>
    On Thu, 29 Apr 2004, Justin Chapweske wrote:
    > This brings up another issue.  With the second style of PATCHing,
    > the suggestion that "  The server SHOULD provide a MD5 hash of the
    > content after the delta was applied.  " becomes very onerous indeed
    > since a bunch of small writes to a huge file will result in an
    > unacceptable performance hit.

    Yes, and the performance hit can be unacceptable for the first kind of
    patching as well. However, SHOULD level seems appropriate for this
    case. The server is free to skip MD5 calculation for large files, for
    example.

It would seem to me that, if the problem is the overhead required
for verifying the correct result of a long series of small writes,
then these two steps should be decoupled.

That is, each write could take place without require the generation
of a hash of the whole file, followed by a client-initiated operation
specifically requesting the hash ("now that I'm done with all of
those little writes").

NFS Version 3 did something analogous; instead of the fully
synchronous-to-stable-storage WRITE RPC operation in NFSv2,
NFSv3 allows (but does not require) the client to WRITE a
series of bufferable data to the server, followed by a COMMIT
RPC to force all buffered data (for the specified filehandle)
to stable storage.  This apparently gives big performance gains.

-Jeff

Received on Friday, 30 April 2004 18:32:57 UTC