Re: Draft for Resumable Uploads

> On Apr 5, 2022, at 1:11 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> 
> Am 05.04.2022 um 09:53 schrieb Guoye Zhang:
>> 
>> 
>>> On Apr 4, 2022, at 11:05 PM, Eric J Bowman <mellowmutt@zoho.com> wrote:
>>> 
>>> >
>>> > First, how does it uniquely identify a resumable upload?
>>> >
>>> 
>>> A 206 response to a non-range request uniquely, unambiguously, and
>>> elegantly identifies an incomplete resource. Identifying a resource as
>>> both incomplete *and* completeable, introduces tight coupling at the
>>> protocol layer.
>>> 
>>> The Content-Length header should suffice to inform a client where to
>>> resume the upload. Whether it's allowed to or not, seems an
>>> application-layer concern (beyond authentication), hidden behind the
>>> uniform interface.
>>> 
>>> >
>>> > Does the server need to send a unique URL to the client?
>>> >
>>> 
>>> Thought we were talking Web Arch, where all URIs are unique, and the
>>> basis of messaging is a client requesting a representation of the
>>> identified server resource. Clients may receive URLs from servers, but
>>> I can always type 'em into my user-agent, which is of particular note
>>> when we're including PUT in the convo. ;)
>>> 
>>> >
>>> > What about the additional roundtrip time?
>>> >
>>> 
>>> Wasn't the upload interrupted? Even if it was for a nanosecond, the
>>> network connection needs to be re-established... the client needs to
>>> re-authenticate with the server... seems we're way beyond one HTTP r/t
>>> of latency making any real-world difference.
>>> 
>>> >
>>> > tus-v2 draft solves it by defining `Upload-Token` which is a
>>> > cryptographically-random token generated on the client side.
>>> >
>>> 
>>> To save one HTTP round trip when restarting an interrupted download?
>>> That's where ya lost me.
>> 
>> To give you an example, let’s suppose we have an html form like this:
>> 
>> <form action="/upload"enctype="multipart/form-data"method="post">
>>   <input type="file" name="file">
>>   <input type="submit" value="Submit">
>> </form>
>> 
>> When a user click “Submit”, the browser sends a `POST /upload` request
>> with the file content as the request body. However, if the upload
>> interrupts and the client tries `HEAD /upload`, how does the server know
>> which upload the client is referring to?
>> 
>> The server can of course use surrounding information (such as Cookie) to
>> look up the incomplete upload. However, that prevents the same client to
>> submit multiple forms concurrently. Alternatively, the server can also
>> generate dynamic HTML pages to encode a unique identifier in the form
>> action URL, but that prevents caching and doesn’t solve other non-HTML
>> use cases.
>> 
>> `Upload-Token` is designed to resolve this issue by labeling every
>> upload with a unique ID.
>>> 
>>> >
>>> > We’ve also looked at the `Range` header but decided
>>> > against it due to it providing too much flexibility.
>>> >
>>> 
>>> That's an interesting way of putting it, thanks! I never even
>>> considered using it to solve what I call the "partial-PUT problem"
>>> we're dealing with, shoulda left it out of my last email, really only
>>> Content-Length matters here.
>>> 
>>> >
>>> > Features like multipart ranges are extremely difficult to
>>> > support on the server.
>>> >
>>> 
>>> You're preaching to the choir. Granted I've been ranching for the past
>>> decade, but before that I was a web developer since 1994 and have
>>> coded many a webserver. The fundamentals to which I adhere, have not
>>> changed over time. Makes me salty, but if you know where to find any
>>> of my old work on archive.org the takeaway should be that it all still
>>> functions on the latest browsers. Except the browser-native XSLT stuff.
>>> 
>>> >
>>> > That being said, we can revisit this decision if the work group
>>> provides a different perspective.
>>> >
>>> 
>>> All about consensus and working code. I'd love to link you to a
>>> rest-discuss thread about my PUT/PATCH demo, I had a
>>> deliberately-broken example image that responded 206 online for years,
>>> back in the aughties; archive.org serves it 200 tho. If you ever
>>> *really* tried it you were using curl. It helped me grok/explain
>>> resource v representation. The broken representation of one resource
>>> (a Mona Lisa icon), was itself a first-class, read-only resource.
>>> 
>>> Most browsers rendered the incomplete image, some displayed their
>>> broken-image icon, none choked on the 206. The same representation was
>>> available from a PUT/PATCH-enabled, access-restricted URI (allowing it
>>> to be fixed, one way or another, without affecting the other URI
>>> dedicated to the broken variant), and the discussion was about how
>>> PATCH increases in value the larger the file. ARF vs. ARCF, C = Continue.
>>> 
>>> At some point you've transferred more of the file than you're willing
>>> to Retry and overwrite, vs. Continue appending. I guess if you want
>>> you can introduce 1xx responses into the mix there, but I'm not seeing
>>> it as necessary.
>> 
>> Thanks, I will look it up.
>>> 
>>> >
>>> > Overall, we believe that a tightly-scoped standard would benefit
>>> > implementors and encourage wide adoption.
>>> >
>>> 
>>> All due respect, I'm seeing a tightly-coupled solution to the ages-old
>>> partial-PUT problem, which does not falsify using a 206 response to a
>>> non-range request to unambiguously communicate the state of the
>>> resource as "incomplete" where Content-Length gives the exact byte
>>> where the transfer was interrupted. IMHO, a loosely-coupled approach
>>> is better for encouraging wide adoption, under the standard as written.
>>> 
>>> -Eric
>>> 
>> The recent trend has shifted to define standards with no ambiguity and
>> with strict requirements for implementations, as people recognize that
>> unplanned extensibility makes interop much more difficult. HTTP/3
>> standard is a good example of that philosophy, whereas if you give the
>> PATCH RFC5789 to 100 people to implement, they will end up with 100
>> incompatible protocols.
>> ...
> 
> Could you elaborate a bit?

Oh, it was just an offhand comment on how vague the PATCH standard is, as a response to an earlier comment in the thread:

>>> Client now knows how to PATCH the resource.

I was exaggerating a bit, but I have no idea how to PATCH a resource after reading RFC5789, it just introduces a general concept and leaves nearly everything open to implementations. That’s why I believe we need further standards for any concrete use.

Guoye
> 
>> ...
> 
> Best regards, Julian

Received on Tuesday, 5 April 2022 19:23:03 UTC