Re: Draft for Resumable Uploads

> On Apr 1, 2022, at 10:30 AM, Roy T. Fielding <fielding@gbiv.com> wrote:
> 
>> On Apr 1, 2022, at 2:48 AM, Marius Kleidl <marius@transloadit.com> wrote:
>> 
>> Hello HTTP working group,
>> 
>> we are all familiar with connectivity disruptions affecting our internet activities. One example is when a large file download is interrupted; say a 100 MB file download encounters a network loss after the client receives 70 MB. Fortunately, resumable HTTP downloads using range requests are a widely deployed standard feature that allows clients to fetch the remaining 30 MB only, saving time and resources for both endpoints. However, in the opposite direction, there is not a standard convention for resuming HTTP uploads.
>> 
>> Across the HTTP ecosystem there are several different approaches to providing resumable uploads. We are aware of at least one attempt to try and standardize an approach [1], but to our knowledge none have succeeded in being adopted and driven to conclusion.
>> 
>> We believe resumable uploads are a common problem and that there is value in a standard resumable upload approach. We've been working on a document [2] [3] that uses HTTP to solve what we believe to be the core problem set, while also allowing for extended use cases. We are bringing this to the list to understand if there is interest in the working group to solve the problem, and whether our document is a good basis for a solution.
>> 
>> In case you are interested in the background of this draft: The origin is within the tus project [4], which has been developing a HTTP-based protocol for resumable uploads [5] since 2013 (tus was also posted on this mailing list at the time [6]). Furthermore, we also provide various open-source implementations [7] to allow easy usage on the web, in mobile applications, desktop application, or server environments. tus has seen great adaption, proving that there is a demand for an open-source solution providing resumable uploads.
>> 
>> We hope to bring resumable uploads to more people. For this, adopting resumable uploads into HTTP would be a great step. There is also interest in including support for resumable uploads natively into platforms, like browsers and mobile SDKs, so that developers do not have to bring their own library for resumable uploads.
>> 
>> We have taken the main uploading process from our tus protocol and reworked it into a self-containing draft, which we want to present to you! As such, this draft can be seen as an evolution of our work on tus and as a step to increase availability of resumable uploads.
>> 
>> Thank you for any feedback in advance!
>> 
>> Best regards,
>> Marius Kleidl
>> 
>> [1] https://lists.w3.org/Archives/Public/ietf-http-wg/2019JulSep/0066.html
>> [2] https://datatracker.ietf.org/doc/draft-tus-httpbis-resumable-uploads-protocol/
>> [3] https://github.com/tus/tus-v2
>> [4] https://tus.io/
>> [5] https://tus.io/protocols/resumable-upload.html
>> [6] https://mailarchive.ietf.org/arch/msg/httpbisa/I__B5Kc7h-1TvRRB9rmjY8tR-T0/
>> [7] https://tus.io/implementations.html
>> 
> This is probably not a good day to discuss this, but it is clear from the
> draft that this is not using HTTP correctly.
> 
> tus-v2 assumes that there is a separate resource for uploading, as opposed to
> targeting a resource and letting the server decide whether it can upload into
> a temporary resource for that target. It doesn't indicate what the server
> is to do with the data once it is uploaded, which implies this is just part
> of a private agreement instead of a standard protocol.

tus-v2 is not intended to be a protocol using HTTP, it is designed to be an extension of HTTP. In particular, any upload methods, regardless of POST, PUT, etc can be transparently upgraded to support resumable uploads. It should not prevent the server from organizing the resource in any way.
> 
> Subsequent requests target the same upload resource, instead of targeting
> a separate temporary resource in progress. This results is some seriously
> confused semantics when the client ends with a DELETE targeting the resource
> for uploading.
> 
> Changing the semantics of an existing method using a header field is only
> interoperable if the new field can be ignored. That is not the case here
> for a DELETE on the process URI.

DELETE is only used if the client detects the server supports tus. We can change it to a header if there are existing applications using the same URI for upload and delete.
> 
> Likewise, not targeting by resource (URI) interferes with resource-based
> access control and authorization, and fails to distinguish between uploads
> where the user agent knows where to PUT the data and those where the
> user agent is asking the server to choose where to POST the data.
> 
> For example, what happens when the server includes multiple
> user-authenticated subtrees and this user is only authorized to upload
> to some of them?
> 
> A simple fix is to send the initial upload as a PUT (to a target URI for
> the completed upload) or as a POST (to clearly allow the server to select
> a destination). The server can indicate that it supports continuation by
> providing a temporary URI in a 1xx response. This new target is essentially
> a buffer with a URI. The client can then monitor/continue requests on the
> new URI, cancel by sending DELETE to that new URI, or finalize the upload
> by sending some final metadata (e.g., DIgest) to that new URI. Once final
> (either my completing the original request or receiving a finalized on the
> temporary URI, the server can move the received data to where the client
> indicated and delete the temporary URI.

Unfortunately, we’ve determined that 1xx response is too unreliable for critical information. Many middleboxes drop it, and most client frameworks do not expose it. Therefore we only use it for feature detection and none of the critical functionalities depend on it.
> 
> The temporary URI is the token -- there is no need for a separate identifier,
> unless you want to recover from missed responses (i.e., be able to repeat
> the same request multiple times and let the server decide when it was
> already done, for which a general request-id would be more appropriate).
> 
> Furthermore, the above can be generalized to more useful cases where
> very large uploads are needed in practice. All of the ones that I have seen
> deployed for real reasons have been to solve load/scale/speed problems
> elsewhere in a chain of intermediaries, not just to send a very large file
> to an HTTP origin server (which the vast majority of servers can handle
> just fine with HTTP/1.1 over TCP).
> 
> For example, sending terabytes of data to S3 in parallel uploads to
> multiple services that are then reassembled within AWS. This requires a
> design where the user agent requests instruction on how/where to upload
> each part in parallel and the server reconstitutes the data upon receiving
> finalization of every part. IOW, the initial method with Expect and a field
> indicating how large the upload will be, resulting in a 1xx/3xx list of
> temporary target URIs (or URI templates) selected by the server,
> potentially on different origins, where each indicated range can be
> resumably-uploaded in parallel and then finalized.

Parallel upload is not in scope of this proposal. We are mainly focusing on resuming an interrupted upload.
> 
> Note that, if you stick with HTTP semantics and URIs as identifiers, the
> complex use case is just a generalization of the smaller case.
> 
> Cheers,
> 
> ....Roy
> 

Thanks for reading the proposal and responding. The goal of the protocol is to enhance the upload process of HTTP and be method-agnostic. If our use of GET/DELETE on the same URI is a blocker for your application, we’d happy to discuss options such as changing them into header fields instead, and you can already disambiguate them by the presence of the Upload-Token header.

Guoye

Received on Friday, 1 April 2022 19:36:16 UTC