Re: Draft for Resumable Uploads from Glenn Strauss on 2022-04-11 (ietf-http-wg@w3.org from April to June 2022)

From: Glenn Strauss <gs-lists-ietf-http-wg@gluelogic.com>
Date: Mon, 11 Apr 2022 06:41:09 -0400
To: Guoye Zhang <guoye_zhang@apple.com>
Cc: Eric J Bowman <mellowmutt@zoho.com>, Julian Reschke <julian.reschke@gmx.de>, ietf-http-wg <ietf-http-wg@w3.org>
Message-ID: <YlQFxaE2XuwmL23F@xps13>
The tus-v2 specification reads to me as an application.  Much thought
and effort has been put into this application, but it is an application
with a protocol for client and server.

Guoye Zhang seems to be proposing tus-v2 to implement
"upload (resumable) *transactions*" and define new behavior that the
server must track and handle partial uploads and be able to report
the sparse areas back to the client.  For that, the rsync protocol,
among others, already exists, but is this necessary?  (For parallel
uploads, it might be, though other protocols like zchunk may help
client discovery of server-side state.)


Large incremental uploads are already achievable using servers which
support partial-PUT, including SabreDAV and lighttpd mod_webdav:
upload file serially in chunks, extending the file with each chunk.
Cancellation is achievable with WebDAV DELETE.  The client can choose
an alternate filename while uploading, and then use WebDAV MOVE to
rename the file into place once the upload is complete.

Support for partial-PUT is a non-standard extension to WebDAV,
at least in part due to implementation tradeoffs (resource usage,
lock timeouts, impacting UX).  tus-v2 may be trying to address this.

A robust PUT implementation (whole file replacement) allows downloads
to proceed while a new version is uploaded.  This may be mapped onto
typical filesystems by uploading to a temporary file and atomically
renaming into place when the upload is complete.  WebDAV LOCK can be
used to ensure only one uploader at a time.

When making changes to a file, whole file replacement works well for
small files and often well-enough for medium-sized files.  It is the
case of large files that network bandwidth, reconnection and re-upload
costs, server disk space, and other resource usage might have a larger
impact.

On modern filesystems with support for cloning, a server might be able
to clone the file extents into a temporary file, PATCH a portion of the
file, and then atomically rename the new file into place.  This could be
done with PATCH or with partial-PUT.  For large files, enforcing use of
WebDAV LOCK is recommended to avoid excessive numbers of large
temporary files, especially if copying a large file to a temporary
file instead of cloning.

Another solution for PATCH-like behavior is to use DVCS protocols,
e.g. git, and serve the completed files from a repository working copy.


From my reading of the tus-v2 spec, only parallel upload is not
addressed by the solutions above.  Parallel upload and file
reconstruction is something that is currently achievable by
application-specific implementations, including tus-v2, and
potentially by some DVCS.


Eric Bowman makes numerous excellent points in prior messages, and I
would like to repeat one:

Eric J Bowman wrote:
> Unless you're coding an endpoint instead of a resource, in which
> case the only help I can offer you, is to think in terms of resources
> not endpoints.

> Indeed! In your example /upload is a tightly-coupled RPC endpoint;
> if the request body is the file content why are you using POST instead
> of PUT? 'HEAD/resource' lets the server know exactly which "upload" is
> referenced: if it was interrupted, the server knows it, and responds
> 206. Because REST.

If tus is aimed at /upload, an endpoint, then in my mind tus is an
application handling that endpoint.  Given the alternatives mentioned
above for handling resources, I do not see why a web server would
implement tus as an HTTP standard when end-users can configure tus as an
application running behind a web server to handle configured endpoints
such as /upload.


As others have pointed out, there is room for improvement in PATCH,
e.g. defining a new media type and associated behavior for PATCH.

Mark Nottingham wrote:
> PATCH intentionally leaves everything up to the media type of the
> PATCH request, not the implementation. With hindsight, at least one
> or two well-defined PATCH media types should have been defined at the
> same time as 5789 - their absence (especially JSON's) created a lot
> of confusion.

Eric J Bowman wrote:
> I think you and mnot are correct that we need better-defined PATCH
> media types, I believe that's where to solve this problem, but how
> any media type is rendered has traditionally and properly been a
> client-side concern in HTTP.

> I don't think anyone has even meant to imply that this isn't a
> problem worth solving. That being said... 20 years ago we figured
> _every_ upload was *replaceable* on failure, while realizing PATCH
> would increase in value hand-in-hand with filesize over time.
> Reckoning day has arrived! ;) Thanks for your contribution, and I
> mean that, otherwise I wouldn't bother.

Cheers, Glenn
Received on Monday, 11 April 2022 10:41:31 UTC