- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Fri, 1 Apr 2022 10:30:27 -0700
- To: Marius Kleidl <marius@transloadit.com>
- Cc: ietf-http-wg@w3.org
- Message-Id: <82FAD6B4-F72F-42E0-A72D-4BFAAB9668FD@gbiv.com>
> On Apr 1, 2022, at 2:48 AM, Marius Kleidl <marius@transloadit.com> wrote: > > Hello HTTP working group, > > we are all familiar with connectivity disruptions affecting our internet activities. One example is when a large file download is interrupted; say a 100 MB file download encounters a network loss after the client receives 70 MB. Fortunately, resumable HTTP downloads using range requests are a widely deployed standard feature that allows clients to fetch the remaining 30 MB only, saving time and resources for both endpoints. However, in the opposite direction, there is not a standard convention for resuming HTTP uploads. > > Across the HTTP ecosystem there are several different approaches to providing resumable uploads. We are aware of at least one attempt to try and standardize an approach [1], but to our knowledge none have succeeded in being adopted and driven to conclusion. > > We believe resumable uploads are a common problem and that there is value in a standard resumable upload approach. We've been working on a document [2] [3] that uses HTTP to solve what we believe to be the core problem set, while also allowing for extended use cases. We are bringing this to the list to understand if there is interest in the working group to solve the problem, and whether our document is a good basis for a solution. > > In case you are interested in the background of this draft: The origin is within the tus project [4], which has been developing a HTTP-based protocol for resumable uploads [5] since 2013 (tus was also posted on this mailing list at the time [6]). Furthermore, we also provide various open-source implementations [7] to allow easy usage on the web, in mobile applications, desktop application, or server environments. tus has seen great adaption, proving that there is a demand for an open-source solution providing resumable uploads. > > We hope to bring resumable uploads to more people. For this, adopting resumable uploads into HTTP would be a great step. There is also interest in including support for resumable uploads natively into platforms, like browsers and mobile SDKs, so that developers do not have to bring their own library for resumable uploads. > > We have taken the main uploading process from our tus protocol and reworked it into a self-containing draft, which we want to present to you! As such, this draft can be seen as an evolution of our work on tus and as a step to increase availability of resumable uploads. > > Thank you for any feedback in advance! > > Best regards, > Marius Kleidl > > [1] https://lists.w3.org/Archives/Public/ietf-http-wg/2019JulSep/0066.html <https://lists.w3.org/Archives/Public/ietf-http-wg/2019JulSep/0066.html> > [2] https://datatracker.ietf.org/doc/draft-tus-httpbis-resumable-uploads-protocol/ <https://datatracker.ietf.org/doc/draft-tus-httpbis-resumable-uploads-protocol/> > [3] https://github.com/tus/tus-v2 <https://github.com/tus/tus-v2> > [4] https://tus.io/ <https://tus.io/> > [5] https://tus.io/protocols/resumable-upload.html <https://tus.io/protocols/resumable-upload.html> > [6] https://mailarchive.ietf.org/arch/msg/httpbisa/I__B5Kc7h-1TvRRB9rmjY8tR-T0/ <https://mailarchive.ietf.org/arch/msg/httpbisa/I__B5Kc7h-1TvRRB9rmjY8tR-T0/> > [7] https://tus.io/implementations.html <https://tus.io/implementations.html> > This is probably not a good day to discuss this, but it is clear from the draft that this is not using HTTP correctly. tus-v2 assumes that there is a separate resource for uploading, as opposed to targeting a resource and letting the server decide whether it can upload into a temporary resource for that target. It doesn't indicate what the server is to do with the data once it is uploaded, which implies this is just part of a private agreement instead of a standard protocol. Subsequent requests target the same upload resource, instead of targeting a separate temporary resource in progress. This results is some seriously confused semantics when the client ends with a DELETE targeting the resource for uploading. Changing the semantics of an existing method using a header field is only interoperable if the new field can be ignored. That is not the case here for a DELETE on the process URI. Likewise, not targeting by resource (URI) interferes with resource-based access control and authorization, and fails to distinguish between uploads where the user agent knows where to PUT the data and those where the user agent is asking the server to choose where to POST the data. For example, what happens when the server includes multiple user-authenticated subtrees and this user is only authorized to upload to some of them? A simple fix is to send the initial upload as a PUT (to a target URI for the completed upload) or as a POST (to clearly allow the server to select a destination). The server can indicate that it supports continuation by providing a temporary URI in a 1xx response. This new target is essentially a buffer with a URI. The client can then monitor/continue requests on the new URI, cancel by sending DELETE to that new URI, or finalize the upload by sending some final metadata (e.g., DIgest) to that new URI. Once final (either my completing the original request or receiving a finalized on the temporary URI, the server can move the received data to where the client indicated and delete the temporary URI. The temporary URI is the token -- there is no need for a separate identifier, unless you want to recover from missed responses (i.e., be able to repeat the same request multiple times and let the server decide when it was already done, for which a general request-id would be more appropriate). Furthermore, the above can be generalized to more useful cases where very large uploads are needed in practice. All of the ones that I have seen deployed for real reasons have been to solve load/scale/speed problems elsewhere in a chain of intermediaries, not just to send a very large file to an HTTP origin server (which the vast majority of servers can handle just fine with HTTP/1.1 over TCP). For example, sending terabytes of data to S3 in parallel uploads to multiple services that are then reassembled within AWS. This requires a design where the user agent requests instruction on how/where to upload each part in parallel and the server reconstitutes the data upon receiving finalization of every part. IOW, the initial method with Expect and a field indicating how large the upload will be, resulting in a 1xx/3xx list of temporary target URIs (or URI templates) selected by the server, potentially on different origins, where each indicated range can be resumably-uploaded in parallel and then finalized. Note that, if you stick with HTTP semantics and URIs as identifiers, the complex use case is just a generalization of the smaller case. Cheers, ....Roy
Received on Friday, 1 April 2022 17:30:47 UTC