Re: Draft v1 Update for Resumable Uploads from Guoye Zhang on 2022-06-20 (ietf-http-wg@w3.org from April to June 2022)

From: Guoye Zhang <guoye_zhang@apple.com>
Date: Sun, 19 Jun 2022 20:56:11 -0700
To: Glenn Strauss <gs-lists-ietf-http-wg@gluelogic.com>
Cc: ietf-http-wg@w3.org
Message-id: <1A0308B7-266A-4E12-BC6C-6D321BAFC3D3@apple.com>
> On Jun 19, 2022, at 20:15, Glenn Strauss <gs-lists-ietf-http-wg@gluelogic.com> wrote:
> 
> On Sun, Jun 19, 2022 at 01:04:35AM -0700, Guoye Zhang wrote:
>>> On Jun 18, 2022, at 22:59, gs-lists-ietf-http-wg@gluelogic.com wrote:
>>> 
>>> On Thu, Jun 16, 2022 at 02:30:59PM -0700, Guoye Zhang wrote:
>>>> Our previous resumable upload draft generated a lot of discussions.
>>> 
>>> At least in my case, I attempted to be polite after you submitted a
>>> draft without first doing a survey of existing RFCs.  You admitted no
>>> knowledge of WebDAV RFCs, which I deemed a large oversight considering
>>> the nature of the tus-v2 protocol.
>> 
>> We have looked into WebDAV protocol, but we do not think it’s the direction we want to go. tus-v2 is designed to be a lightweight single-purpose protocol that’s easily implementable by clients and servers. We do not want to design a discovery method for WebDAV and force servers to implement the full WebDAV just for this one feature.
> 
> Let me attempt to simplify things for you, even though I previously
> provided an explicit example.
> 
> I think that the PUT method is sufficient and PUT is part of HTTP/1.1,
> from 1999.
> 
> Servers supporting generic partial-PUT are implemented in production
> today and are generically reusable to append to a file.
> 
> Partial-PUT implementations exist in lighttpd mod_webdav and SabreDAV.
> SabreDAV is used underneath owncloud, nextcloud, and others.
> I am sure there are other examples, but these are two.
> 
> For the sake of simplicity, try to substitute any of my prior uses of
> WebDAV with partial-PUT.  I used WebDAV since my knowledge of production
> implementations with *generic* support for partial-PUT were part of
> full-featured WebDAV servers.
> 
> Besides generic partial-PUT support, Writing application-specific
> support for partial-PUT is not excessively difficult.
> 
> On the other hand, how widely is PATCH implemented?
> (RFC 5789 does not define any required media-type, so any PATCH
> implementations in production today are application-specific.)
> 
>> Apple has a Feedback Assistant app which allows customers to file bug reports and upload device diagnostics. These diagnostics are usually hundreds of megabytes in size, and if interrupted, we have to upload them again from the beginning. This has been one of the most common complains we receive.
> 
> Sounds like an application-specific problem that can be solved with an
> application-specific script which supports partial-PUT.
> 
>>> Now, it is true that non-idempotent requests such as POST and PUT
>>> are not generically safe to automatically retry upon failure.
>>> 
>>> If you are trying to come up with a generic solution to recover a
>>> non-idempotent request, that should be more explicit and better scoped
>>> in the draft than potentially extending multiple existing HTTP request
>>> methods.  Such a goal would require specifying that a server not start
>>> processing the upload in any non-idempotent way until the upload was
>>> complete.  Other requirements might also be necessary.
>>> 
>> This is not true. The resumable upload protocol is designed so the server can start processing data immediately, since clients are required to resume from the exact interruption point. The protocol can be implemented by a CDN so the origin server just receives a regular upload.
> 
> Please note my use of the word "non-idempotent": "... specifying that a 
> server not start processing the upload in any non-idempotent way ..."
> 
> If you are writing an RFC extending HTTP for the internet, then you
> really need to stop thinking so narrowly about your application-specific
> intended use case.

Why can’t the server start processing the upload in a non-idempotent way? The client can only resume from the interruption point, so the series of resumption can be treated as one single overall upload. This does not require idempotence at all.
> 
>> Partial PUT isn’t a clear defined standard, and we cannot use “Content-Range” as explained above since the ability to upload with unknown length is required.
> 
> You have contradicted yourself.  The example I gave using partial-PUT
> fully implements your stated requirement of append-only, as you append
> what you have when you generate it, sequentially extending the file.
> 
>> We are happy to revise the method and header names used by Upload Appending Procedure and all other procedures as long as we maintain the capabilities of tus protocol. If the consensus is that PUT is better than PATCH, we will modify our draft to adopt it.
> 
> I did not say that.  I have stated that partial-PUT is one potential
> solution that is available today and has production implementations
> in service.  I have also suggested that any new RFC should include a
> section why partial-PUT is suboptimal and the new RFC provides a
> (substantially) better solution.
> 
> PATCH with media-type application/tus-v2 may be a better solution
> as you can define the body any way that you like, which may include a
> custom (header) section in the body containing metadata such as the
> information you can not currently describe in Content-Range.

Content-Range requires the range to include the end offset which is not always available. We need something like “Content-Range: 42-/*” to achieve feature parity with the current tus protocol. Not sure if changing the definition of Content-Range is desirable.

Guoye
> 
>>>> 2. Media types
>>>> 
>>>> PATCH currently doesn’t define a media type. We went through the list of media types but couldn’t find the appropriate category for the Upload Appending Procedure. It is a generic byte-appending operation that can modify any types of media, so we don’t think it fits into an application media type.
>>> 
>>> If tus-v2 is going to use PATCH:
>>> Why is tus-v2 not handled as PATCH with media-type application/tus-v2?
>>> tus-v2 is an application protocol.  Content-Type: application/tus-v2
>>> along with tus-v2 request headers would indicate how the request body is
>>> treated by PATCH implementations, if they support application/tus-v2.
>> 
>> From my reading of the PATCH standard, media type should be the type of the content that we are trying to modify.
> 
> The media-type in an HTTP request describes the request body
> (along with Content-Encoding).  Content-Type could be application/json
> if the request body contained a json-encoded structure which identified
> the target file and described commands, context, and instructions how to
> patch the target file.  (I do not recommend this, and merely wanted to
> provide a more concrete example of media-type for request body.)
> 
>> Feature detection is an optional part of the protocol. If an application controls both the client and the server (which is the case today with tus-v1), they can implement the protocol without using 1xx status code. We only require feature detection when a generic HTTP client tries to upgrade a regular upload to a resumable upload.
> 
> I think you should stick to your application-specific protocol to solve
> your application-specific problem in your application-specific domain
> where you control both client and server.
> 
>> We’ve not seen consistent support of “Expect: 100-continue”. Some middleboxes reply with 100 immediately, and some middleboxes drop the 100 response. Therefore, we think a different 1xx status code would work better. We will explore different status code such as 102, but defining a new status code for a new purpose seems like the most straightforward option, as it will be least likely to break existing software.
> 
> Sounds to me like a practical solution is to CONNECT through proxies to
> your application servers, where you can support the application/tus-v2
> protocol.  That would not be a 100% solution, but would work for many.
> 
>> Maybe our goal isn’t very clear from the draft. We don’t just want this to be an application protocol. Yes, it can be implemented by an application on top of existing HTTP libraries, but the reason we are bringing this to the HTTP workgroup is that we hope to build support for this in the HTTP library itself. The goal is to move toward a future where every upload is resumable.
> 
> I think you should prove it out as an application protocol and share it.
> If it is widely adopted and becomes a convention, then maybe it can be
> considered to extend HTTP.
> 
> If your goal is to build support into HTTP libraries itself, then I do
> believe that you have a responsibility to justify why that should be so.
> 
> 
> I think an intern at Apple could quickly write a Python script to assign
> upload transaction ids to uploads, and (after disconnection) to be able
> to match existing transaction id to append to uploads.  Once the upload
> is complete, the script can process the upload.  I do not see why such
> an application-specific protocol -- with application-specific file
> sizes, timeouts, and resource management requirements -- should be
> anything other than an application, perhaps with an open source python
> module that can be reused by others.
> 
> Cheers, Glenn
Received on Monday, 20 June 2022 03:56:44 UTC