Re: Draft for Resumable Uploads from Guoye Zhang on 2022-04-11 (ietf-http-wg@w3.org from April to June 2022)

From: Guoye Zhang <guoye_zhang@apple.com>
Date: Mon, 11 Apr 2022 12:25:45 -0700
To: Austin Wright <aaa@bzfx.net>
Cc: Marius Kleidl <marius@transloadit.com>, ietf-http-wg <ietf-http-wg@w3.org>
Message-id: <D2091800-94A8-441C-BE83-4A00C507723A@apple.com>
> On Apr 10, 2022, at 3:31 PM, Austin Wright <aaa@bzfx.net> wrote:
> 
> 
>> On Apr 6, 2022, at 01:29, Guoye Zhang <guoye_zhang@apple.com> wrote:
>> 
>> Originally, an earlier internal draft version had “Upload Creation Procedure” and “Upload Appending Procedure” with PATCH. Then we recognized that they are nearly the same thing with all the same requirements, just different offsets, so we merged them into a single “Upload Transfer Procedure”.
>> 
>> We can definitely revisit this decision if the consensus is to adopt and improve PATCH.
> 
> I understand them to be different things based on how they’re using HTTP semantics. "Upload Appending Procedure” seemed to follow the method semantics. In contrast, "Upload Transfer Procedure" expands each HTTP method to do something it could not previously do: Ignore the head of this request, and instead combine its body with a previous request (violating the understanding that HTTP messages are stateless).
> 
> You work around this by requiring that clients not send requests if the origin server would misunderstand the feature. This is never the client’s responsibility in HTTP, because a message may pass through multiple programs and servers, not all of which may understand the expanded semantics. And even with respect to the origin server, signals like a DNS record, or even something as specific as an OPTIONS response on the same URI, cannot technically guarantee that the origin will honor the expanded semantics. I think the difference between using HTTP as a substrate versus using it as an application, is: an HTTP substrate merely re-uses existing libraries (e.g. message & protocol parsers) but not necessarily the semantics, and does not support the entire ecosystem of caches, gateways, and proxies (which collectively form HTTP The Application).

Yes, this is a very good point. If a middlebox sees “Content-Type: application/json” but does not find a valid JSON in the request body, it might freak out.

It is somewhat addressed by “Clients MUST NOT send representation metadata in subsequent Upload Transfer Procedure requests”, but yes, we definitely need more considerations in this area.
> 
> However, I see some somewhat straightforward changes could fix this.
> 
> In general, there’s three options to adding a new feature to HTTP:
> 
> 1. The server has to transparently fall back to acceptable behavior (e.g. 200 OK instead of 206 Partial Content)
> 2. The server has to produce an error (e.g. unknown method, or unknown media type)
> 3. It has to be implemented a different layer (e.g. as a feature of TCP, TLS, SCTP, HTTP/2 framing, or QUIC)
> 
> I see a combination of these being necessary:
> 
> (1) The client should assume no support, and make its first request as normal (requesting support for resumable uploads). If the server supports the feature, it can communicate this back, with a URI representing the attempted operation. This satisfies option (1) transparently fall back.
> 
> (2) If the connection needs to be resumed, the client can use a new method, or PATCH with a new media type, on the server’s selected operation URI. This satisfies option (2) produce an error.

These are great suggestions! We will discuss it internally.
> 
> (3) Do resumable uploads really need to be specific to HTTP? Suppose there were a feature of HTTP/2 framing, where a client could ask the server “Please generate a UUID for this stream and keep it in the background in the event of a disconnection” or “Please tell me where that stream left off, and resume this stream from there”. This satisfies option (3) different layer.
> 
> Finally, your proposal is general and low-level (to the point where the same effect could be achieved in TLS instead). Since virtually all Web applications are implemented above the transport layer, it may be technically difficult to implement, at very little benefit. It follows general stream semantics (ordered bytes), which prohibits features like multiple parallel uploads, which is important as server workloads become increasingly parallel. There would be only a handful of ways this could be implemented in Web applications:
> 
> 1. The OS or HTTP server implements resumable uploads, and combines multiple HTTP messages in a manner transparent to the written application. If the subsequent HTTP requests hit a different origin server than the first one (a different node in the cluster), it would have to hand-off the request body somehow.
> 
> 2. The application is written as some sort of state machine that can be handed off or shared between multiple nodes in a cluster. I’m not aware of any development frameworks that do anything remotely like this.
> 
> 3. Developers choose the specific resources & methods that are likely to benefit from resumable uploads, and describe how the database stores the intermediate progress of each class of operation. This seems like a huge potential for errors due to how infrequently many code paths would be run, and how difficult it is to trigger in tests.
> 
> Resumable requests are still reasonable, but I think where we stand to benefit the most are application-layer approaches. In particular, segmented file uploading, which would enable “parallel PUT”.
> 
> Thanks,
> 
> Austin.

Parallel segmented uploads were discussed, and we intentionally disallow it in the draft due to the added complexity on the server. We are matching the capability of the current tus protocol with improvements in a few areas.

Let’s suppose in the client’s perspective the connection is dead, but the server is still receiving some bytes trickling in. At this point, the client queries the offset and tries to resume. In many implementations of the current resumable upload protocol, the resumption will fail since the offset is changing and wouldn’t match. Therefore we are clarifying that querying offset should freeze the progress and terminate previous uploads. Unfortunately, this prevents the protocol from supporting parallel uploads, but we think it’s a more concrete problem to solve.

Guoye
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Monday, 11 April 2022 19:26:07 UTC