Re: Draft for Resumable Uploads from Julian Reschke on 2022-04-05 (ietf-http-wg@w3.org from April to June 2022)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Tue, 5 Apr 2022 10:11:41 +0200
To: Guoye Zhang <guoye_zhang@apple.com>, Eric J Bowman <mellowmutt@zoho.com>
Cc: ietf-http-wg <ietf-http-wg@w3.org>
Message-ID: <29b48c02-95b0-8e30-1d36-f898eb672ad2@gmx.de>
Am 05.04.2022 um 09:53 schrieb Guoye Zhang:
>
>
>> On Apr 4, 2022, at 11:05 PM, Eric J Bowman <mellowmutt@zoho.com> wrote:
>>
>> >
>> > First, how does it uniquely identify a resumable upload?
>> >
>>
>> A 206 response to a non-range request uniquely, unambiguously, and
>> elegantly identifies an incomplete resource. Identifying a resource as
>> both incomplete *and* completeable, introduces tight coupling at the
>> protocol layer.
>>
>> The Content-Length header should suffice to inform a client where to
>> resume the upload. Whether it's allowed to or not, seems an
>> application-layer concern (beyond authentication), hidden behind the
>> uniform interface.
>>
>> >
>> > Does the server need to send a unique URL to the client?
>> >
>>
>> Thought we were talking Web Arch, where all URIs are unique, and the
>> basis of messaging is a client requesting a representation of the
>> identified server resource. Clients may receive URLs from servers, but
>> I can always type 'em into my user-agent, which is of particular note
>> when we're including PUT in the convo. ;)
>>
>> >
>> > What about the additional roundtrip time?
>> >
>>
>> Wasn't the upload interrupted? Even if it was for a nanosecond, the
>> network connection needs to be re-established... the client needs to
>> re-authenticate with the server... seems we're way beyond one HTTP r/t
>> of latency making any real-world difference.
>>
>> >
>> > tus-v2 draft solves it by defining `Upload-Token` which is a
>> > cryptographically-random token generated on the client side.
>> >
>>
>> To save one HTTP round trip when restarting an interrupted download?
>> That's where ya lost me.
>
> To give you an example, let’s suppose we have an html form like this:
>
> <form action="/upload"enctype="multipart/form-data"method="post">
>    <input type="file" name="file">
>    <input type="submit" value="Submit">
> </form>
>
> When a user click “Submit”, the browser sends a `POST /upload` request
> with the file content as the request body. However, if the upload
> interrupts and the client tries `HEAD /upload`, how does the server know
> which upload the client is referring to?
>
> The server can of course use surrounding information (such as Cookie) to
> look up the incomplete upload. However, that prevents the same client to
> submit multiple forms concurrently. Alternatively, the server can also
> generate dynamic HTML pages to encode a unique identifier in the form
> action URL, but that prevents caching and doesn’t solve other non-HTML
> use cases.
>
> `Upload-Token` is designed to resolve this issue by labeling every
> upload with a unique ID.
>>
>> >
>> > We’ve also looked at the `Range` header but decided
>> > against it due to it providing too much flexibility.
>> >
>>
>> That's an interesting way of putting it, thanks! I never even
>> considered using it to solve what I call the "partial-PUT problem"
>> we're dealing with, shoulda left it out of my last email, really only
>> Content-Length matters here.
>>
>> >
>> > Features like multipart ranges are extremely difficult to
>> > support on the server.
>> >
>>
>> You're preaching to the choir. Granted I've been ranching for the past
>> decade, but before that I was a web developer since 1994 and have
>> coded many a webserver. The fundamentals to which I adhere, have not
>> changed over time. Makes me salty, but if you know where to find any
>> of my old work on archive.org the takeaway should be that it all still
>> functions on the latest browsers. Except the browser-native XSLT stuff.
>>
>> >
>> > That being said, we can revisit this decision if the work group
>> provides a different perspective.
>> >
>>
>> All about consensus and working code. I'd love to link you to a
>> rest-discuss thread about my PUT/PATCH demo, I had a
>> deliberately-broken example image that responded 206 online for years,
>> back in the aughties; archive.org serves it 200 tho. If you ever
>> *really* tried it you were using curl. It helped me grok/explain
>> resource v representation. The broken representation of one resource
>> (a Mona Lisa icon), was itself a first-class, read-only resource.
>>
>> Most browsers rendered the incomplete image, some displayed their
>> broken-image icon, none choked on the 206. The same representation was
>> available from a PUT/PATCH-enabled, access-restricted URI (allowing it
>> to be fixed, one way or another, without affecting the other URI
>> dedicated to the broken variant), and the discussion was about how
>> PATCH increases in value the larger the file. ARF vs. ARCF, C = Continue.
>>
>> At some point you've transferred more of the file than you're willing
>> to Retry and overwrite, vs. Continue appending. I guess if you want
>> you can introduce 1xx responses into the mix there, but I'm not seeing
>> it as necessary.
>
> Thanks, I will look it up.
>>
>> >
>> > Overall, we believe that a tightly-scoped standard would benefit
>> > implementors and encourage wide adoption.
>> >
>>
>> All due respect, I'm seeing a tightly-coupled solution to the ages-old
>> partial-PUT problem, which does not falsify using a 206 response to a
>> non-range request to unambiguously communicate the state of the
>> resource as "incomplete" where Content-Length gives the exact byte
>> where the transfer was interrupted. IMHO, a loosely-coupled approach
>> is better for encouraging wide adoption, under the standard as written.
>>
>> -Eric
>>
> The recent trend has shifted to define standards with no ambiguity and
> with strict requirements for implementations, as people recognize that
> unplanned extensibility makes interop much more difficult. HTTP/3
> standard is a good example of that philosophy, whereas if you give the
> PATCH RFC5789 to 100 people to implement, they will end up with 100
> incompatible protocols.
> ...

Could you elaborate a bit?

> ...

Best regards, Julian
Received on Tuesday, 5 April 2022 08:12:08 UTC