Re: Draft for Resumable Uploads from Guoye Zhang on 2022-04-05 (ietf-http-wg@w3.org from April to June 2022)

From: Guoye Zhang <guoye_zhang@apple.com>
Date: Tue, 05 Apr 2022 00:53:44 -0700
To: Eric J Bowman <mellowmutt@zoho.com>
Cc: Julian Reschke <julian.reschke@gmx.de>, ietf-http-wg <ietf-http-wg@w3.org>
Message-id: <589722AC-F37C-437F-80EA-E948150DE291@apple.com>
> On Apr 4, 2022, at 11:05 PM, Eric J Bowman <mellowmutt@zoho.com> wrote:
> 
> >
> > First, how does it uniquely identify a resumable upload?
> > 
> 
> A 206 response to a non-range request uniquely, unambiguously, and elegantly identifies an incomplete resource. Identifying a resource as both incomplete *and* completeable, introduces tight coupling at the protocol layer.
> 
> The Content-Length header should suffice to inform a client where to resume the upload. Whether it's allowed to or not, seems an application-layer concern (beyond authentication), hidden behind the uniform interface.
> 
> >
> > Does the server need to send a unique URL to the client?
> >
> 
> Thought we were talking Web Arch, where all URIs are unique, and the basis of messaging is a client requesting a representation of the identified server resource. Clients may receive URLs from servers, but I can always type 'em into my user-agent, which is of particular note when we're including PUT in the convo. ;)
> 
> >
> > What about the additional roundtrip time?
> >
> 
> Wasn't the upload interrupted? Even if it was for a nanosecond, the network connection needs to be re-established... the client needs to re-authenticate with the server... seems we're way beyond one HTTP r/t of latency making any real-world difference.
> 
> >
> > tus-v2 draft solves it by defining `Upload-Token` which is a 
> > cryptographically-random token generated on the client side.
> >
> 
> To save one HTTP round trip when restarting an interrupted download? That's where ya lost me.

To give you an example, let’s suppose we have an html form like this:

<form action="/upload" enctype="multipart/form-data" method="post">
  <input type="file" name="file">
  <input type="submit" value="Submit">
</form>

When a user click “Submit”, the browser sends a `POST /upload` request with the file content as the request body. However, if the upload interrupts and the client tries `HEAD /upload`, how does the server know which upload the client is referring to?

The server can of course use surrounding information (such as Cookie) to look up the incomplete upload. However, that prevents the same client to submit multiple forms concurrently. Alternatively, the server can also generate dynamic HTML pages to encode a unique identifier in the form action URL, but that prevents caching and doesn’t solve other non-HTML use cases.

`Upload-Token` is designed to resolve this issue by labeling every upload with a unique ID.
> 
> >
> > We’ve also looked at the `Range` header but decided 
> > against it due to it providing too much flexibility.
> >
> 
> That's an interesting way of putting it, thanks! I never even considered using it to solve what I call the "partial-PUT problem" we're dealing with, shoulda left it out of my last email, really only Content-Length matters here.
> 
> >
> > Features like multipart ranges are extremely difficult to
> > support on the server.
> >
> 
> You're preaching to the choir. Granted I've been ranching for the past decade, but before that I was a web developer since 1994 and have coded many a webserver. The fundamentals to which I adhere, have not changed over time. Makes me salty, but if you know where to find any of my old work on archive.org the takeaway should be that it all still functions on the latest browsers. Except the browser-native XSLT stuff.
> 
> >
> > That being said, we can revisit this decision if the work group provides a different perspective.
> >
> 
> All about consensus and working code. I'd love to link you to a rest-discuss thread about my PUT/PATCH demo, I had a deliberately-broken example image that responded 206 online for years, back in the aughties; archive.org serves it 200 tho. If you ever *really* tried it you were using curl. It helped me grok/explain resource v representation. The broken representation of one resource (a Mona Lisa icon), was itself a first-class, read-only resource.
> 
> Most browsers rendered the incomplete image, some displayed their broken-image icon, none choked on the 206. The same representation was available from a PUT/PATCH-enabled, access-restricted URI (allowing it to be fixed, one way or another, without affecting the other URI dedicated to the broken variant), and the discussion was about how PATCH increases in value the larger the file. ARF vs. ARCF, C = Continue.
> 
> At some point you've transferred more of the file than you're willing to Retry and overwrite, vs. Continue appending. I guess if you want you can introduce 1xx responses into the mix there, but I'm not seeing it as necessary.

Thanks, I will look it up.
> 
> >
> > Overall, we believe that a tightly-scoped standard would benefit
> > implementors and encourage wide adoption.
> >
> 
> All due respect, I'm seeing a tightly-coupled solution to the ages-old partial-PUT problem, which does not falsify using a 206 response to a non-range request to unambiguously communicate the state of the resource as "incomplete" where Content-Length gives the exact byte where the transfer was interrupted. IMHO, a loosely-coupled approach is better for encouraging wide adoption, under the standard as written.
> 
> -Eric
> 
The recent trend has shifted to define standards with no ambiguity and with strict requirements for implementations, as people recognize that unplanned extensibility makes interop much more difficult. HTTP/3 standard is a good example of that philosophy, whereas if you give the PATCH RFC5789 to 100 people to implement, they will end up with 100 incompatible protocols.

Going back to the current topic. We want to answer these 3 questions, in this specific order:

(1) Is a resumable upload standard necessary?

 I think this is an easy argument. Current state of many incompatible protocols makes it impossible to interop.

(2) Does everyone agree with tus-v2’s main goals? To design a mechanism to upgrade any upload to resumable upload, and to be easily implementable on top of any HTTP libraries.

 Many design decisions are the direct results of these goals, and we hope to gain consensus on these goals and potentially seek out more goals others might have.

(3) What about the protocol details? 204 vs 206, Upload-Offset vs Content-Length vs Content-Range.

 I think we should answer the first two before digging into these, as it’s the least critical and we can easily make changes.

Guoye
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Tuesday, 5 April 2022 07:54:03 UTC