Re: Draft v1 Update for Resumable Uploads from Glenn Strauss on 2022-06-20 (ietf-http-wg@w3.org from April to June 2022)

From: Glenn Strauss <gs-lists-ietf-http-wg@gluelogic.com>
Date: Sun, 19 Jun 2022 23:15:44 -0400
To: Guoye Zhang <guoye_zhang@apple.com>
Cc: ietf-http-wg@w3.org
Message-ID: <Yq/mYB6FMLWn/7Oj@xps13>
On Sun, Jun 19, 2022 at 01:04:35AM -0700, Guoye Zhang wrote:
> > On Jun 18, 2022, at 22:59, gs-lists-ietf-http-wg@gluelogic.com wrote:
> > 
> > On Thu, Jun 16, 2022 at 02:30:59PM -0700, Guoye Zhang wrote:
> >> Our previous resumable upload draft generated a lot of discussions.
> > 
> > At least in my case, I attempted to be polite after you submitted a
> > draft without first doing a survey of existing RFCs.  You admitted no
> > knowledge of WebDAV RFCs, which I deemed a large oversight considering
> > the nature of the tus-v2 protocol.
> 
> We have looked into WebDAV protocol, but we do not think it’s the direction we want to go. tus-v2 is designed to be a lightweight single-purpose protocol that’s easily implementable by clients and servers. We do not want to design a discovery method for WebDAV and force servers to implement the full WebDAV just for this one feature.

Let me attempt to simplify things for you, even though I previously
provided an explicit example.

I think that the PUT method is sufficient and PUT is part of HTTP/1.1,
from 1999.

Servers supporting generic partial-PUT are implemented in production
today and are generically reusable to append to a file.

Partial-PUT implementations exist in lighttpd mod_webdav and SabreDAV.
SabreDAV is used underneath owncloud, nextcloud, and others.
I am sure there are other examples, but these are two.

For the sake of simplicity, try to substitute any of my prior uses of
WebDAV with partial-PUT.  I used WebDAV since my knowledge of production
implementations with *generic* support for partial-PUT were part of
full-featured WebDAV servers.

Besides generic partial-PUT support, Writing application-specific
support for partial-PUT is not excessively difficult.

On the other hand, how widely is PATCH implemented?
(RFC 5789 does not define any required media-type, so any PATCH
 implementations in production today are application-specific.)

> Apple has a Feedback Assistant app which allows customers to file bug reports and upload device diagnostics. These diagnostics are usually hundreds of megabytes in size, and if interrupted, we have to upload them again from the beginning. This has been one of the most common complains we receive.

Sounds like an application-specific problem that can be solved with an
application-specific script which supports partial-PUT.

> > Now, it is true that non-idempotent requests such as POST and PUT
> > are not generically safe to automatically retry upon failure.
> > 
> > If you are trying to come up with a generic solution to recover a
> > non-idempotent request, that should be more explicit and better scoped
> > in the draft than potentially extending multiple existing HTTP request
> > methods.  Such a goal would require specifying that a server not start
> > processing the upload in any non-idempotent way until the upload was
> > complete.  Other requirements might also be necessary.
> > 
> This is not true. The resumable upload protocol is designed so the server can start processing data immediately, since clients are required to resume from the exact interruption point. The protocol can be implemented by a CDN so the origin server just receives a regular upload.

Please note my use of the word "non-idempotent": "... specifying that a 
server not start processing the upload in any non-idempotent way ..."

If you are writing an RFC extending HTTP for the internet, then you
really need to stop thinking so narrowly about your application-specific
intended use case.

> Partial PUT isn’t a clear defined standard, and we cannot use “Content-Range” as explained above since the ability to upload with unknown length is required.

You have contradicted yourself.  The example I gave using partial-PUT
fully implements your stated requirement of append-only, as you append
what you have when you generate it, sequentially extending the file.

> We are happy to revise the method and header names used by Upload Appending Procedure and all other procedures as long as we maintain the capabilities of tus protocol. If the consensus is that PUT is better than PATCH, we will modify our draft to adopt it.

I did not say that.  I have stated that partial-PUT is one potential
solution that is available today and has production implementations
in service.  I have also suggested that any new RFC should include a
section why partial-PUT is suboptimal and the new RFC provides a
(substantially) better solution.

PATCH with media-type application/tus-v2 may be a better solution
as you can define the body any way that you like, which may include a
custom (header) section in the body containing metadata such as the
information you can not currently describe in Content-Range.

> >> 2. Media types
> >> 
> >> PATCH currently doesn’t define a media type. We went through the list of media types but couldn’t find the appropriate category for the Upload Appending Procedure. It is a generic byte-appending operation that can modify any types of media, so we don’t think it fits into an application media type.
> > 
> > If tus-v2 is going to use PATCH:
> > Why is tus-v2 not handled as PATCH with media-type application/tus-v2?
> > tus-v2 is an application protocol.  Content-Type: application/tus-v2
> > along with tus-v2 request headers would indicate how the request body is
> > treated by PATCH implementations, if they support application/tus-v2.
> 
> From my reading of the PATCH standard, media type should be the type of the content that we are trying to modify.

The media-type in an HTTP request describes the request body
(along with Content-Encoding).  Content-Type could be application/json
if the request body contained a json-encoded structure which identified
the target file and described commands, context, and instructions how to
patch the target file.  (I do not recommend this, and merely wanted to
provide a more concrete example of media-type for request body.)

> Feature detection is an optional part of the protocol. If an application controls both the client and the server (which is the case today with tus-v1), they can implement the protocol without using 1xx status code. We only require feature detection when a generic HTTP client tries to upgrade a regular upload to a resumable upload.

I think you should stick to your application-specific protocol to solve
your application-specific problem in your application-specific domain
where you control both client and server.

> We’ve not seen consistent support of “Expect: 100-continue”. Some middleboxes reply with 100 immediately, and some middleboxes drop the 100 response. Therefore, we think a different 1xx status code would work better. We will explore different status code such as 102, but defining a new status code for a new purpose seems like the most straightforward option, as it will be least likely to break existing software.

Sounds to me like a practical solution is to CONNECT through proxies to
your application servers, where you can support the application/tus-v2
protocol.  That would not be a 100% solution, but would work for many.

> Maybe our goal isn’t very clear from the draft. We don’t just want this to be an application protocol. Yes, it can be implemented by an application on top of existing HTTP libraries, but the reason we are bringing this to the HTTP workgroup is that we hope to build support for this in the HTTP library itself. The goal is to move toward a future where every upload is resumable.

I think you should prove it out as an application protocol and share it.
If it is widely adopted and becomes a convention, then maybe it can be
considered to extend HTTP.

If your goal is to build support into HTTP libraries itself, then I do
believe that you have a responsibility to justify why that should be so.


I think an intern at Apple could quickly write a Python script to assign
upload transaction ids to uploads, and (after disconnection) to be able
to match existing transaction id to append to uploads.  Once the upload
is complete, the script can process the upload.  I do not see why such
an application-specific protocol -- with application-specific file
sizes, timeouts, and resource management requirements -- should be
anything other than an application, perhaps with an open source python
module that can be reused by others.

Cheers, Glenn
Received on Monday, 20 June 2022 03:16:09 UTC