Re: Resumable Uploads

On Thu, 18 Apr 2013, Felix Geisendörfer wrote:
> I'm interested in finding out how to perform resumable uploads over http
> while being compliant with existing specifications. The result of this 
> work
> will be shared with the community to create interopable server/client
> software to simplify file uploading on the web.
Here are my thoughts on this, some of which have already been mentioned 
by others.

- The client should/must have a way to signal to the server that its 
supports partial uploads so that the server can respond accordingly.  
I'd suggest a "partial-upload" (or similar)  preference to be used with 
the Prefer [1] request header.

- Allow progress of the upload to be reported back to the client via 
Progress [2] (or similar) response header.

- PATCH seems to be designed explicitly for the purpose up updating 
existing resources and would make sense to use for completing/repairing 
an upload.  I'd suggest use of multipart/byteranges as the "patch" document.

- A server will acknowledge that it supports "partial-upload" by 
including an "Accept-Patch: multipart/byteranges" header in its responses.

- Allow HEAD requests to include range request semantics in the presence 
of "Prefer: partial-upload".

An example of how this might work (I've left out most headers for brevity);

Initial request:

PUT /file.pdf HTTP/1.1
Expect: 100-continue
Prefer: partial-upload
Content-Type: application/pdf
Content-Length: 1000

HTTP/1.1 100 Continue
Allow-Patch: multipart/byteranges

[ 1000 bytes of data ]
HTTP/1.1 102 Processing				<<< optional progress status
Progress: 256/1000

HTTP/1.1 102 Processing				<<< optional progress status
Progress: 512/1000

<<< connection lost >>>


Since in the case above the client already knows that the server 
processed bytes 0-511, it can try to resume the upload immediately 
(saving a round-trip). Otherwise it can check the current status with a 
HEAD range request:

HEAD /file.pdf HTTP/1.1
Prefer: partial-upload
Range: bytes=0-

HTTP/1.1 206 Partial Content
Allow-Patch: multipart/byteranges
Content-Range: bytes 0-511/1000



Resumption of upload:

PATCH /file.pdf HTTP/1.1
Prefer: partial-upload
Content-Type: multipart/byteranges; boundary=THIS_STRING_SEPARATES
Content-Length: xxx

--THIS_STRING_SEPARATES
Content-Type: application/pdf
Content-Range: bytes 512-999/1000

[ 488 bytes of data ]
--THIS_STRING_SEPARATES--

HTTP/1.1 102 Processing				<<< optional progress status
Progress: 256/488

HTTP/1.1 204 No Content				<<< completed successfully


This scheme can probably be tweaked to work with chunked uploads, but I 
haven't thought much about it yet.

In cases where a client does a HEAD/GET on a partial resource without 
"Prefer: partial-upload", I don't know what the server should do.  There 
are at least 4 options:

    * Treat the resource as complete and return 200
    * Treat the resource as partial and always return 206 (will probably
      break clients)
    * Treat the resource as non-existent and return 404
    * Fail the request with a 403 (or similar)


To append data to an existing resource we could extend the Content-Range 
ABNF a little to allow a PATCH request as follows:

PATCH /log.txt HTTP/1.1
Content-Type: multipart/byteranges; boundary=THIS_STRING_SEPARATES
Content-Length: xxx

--THIS_STRING_SEPARATES
Content-Type: text/plain
Content-Range: bytes +200/*			<<<  append 200 bytes to existing length

[ 200 bytes of data ]
--THIS_STRING_SEPARATES--

HTTP/1.1 204 No Content


[1] http://tools.ietf.org/html/draft-snell-http-prefer
[2] http://tools.ietf.org/html/draft-decroy-http-progress

-- 
Kenneth Murchison
Principal Systems Software Engineer
Carnegie Mellon University

Received on Monday, 22 April 2013 17:43:51 UTC