- From: <gs-lists-ietf-http-wg@gluelogic.com>
- Date: Sun, 19 Jun 2022 01:59:52 -0400
- To: Guoye Zhang <guoye_zhang@apple.com>
- Cc: ietf-http-wg@w3.org
On Thu, Jun 16, 2022 at 02:30:59PM -0700, Guoye Zhang wrote: > Our previous resumable upload draft generated a lot of discussions. At least in my case, I attempted to be polite after you submitted a draft without first doing a survey of existing RFCs. You admitted no knowledge of WebDAV RFCs, which I deemed a large oversight considering the nature of the tus-v2 protocol. > I’m glad to announce that we have a new draft ready to address many feedbacks that suggested adopting the PATCH method. The draft abstract begins with unsubstantiated claims to justify itself, and I believe that almost all of those claims are also misleading. "HTTP clients often encounter interrupted data transfers as a result of canceled requests or dropped connections. [...] it is often desirable to issue subsequent requests that transfer only the remainder of the representation." The multiple uses of "often" are misrepresentations, IMHO. A large percentage of HTTP requests are GET/HEAD and have no body. A sizable percentage (if not more) of HTTP POST requests are small, e.g. using POST as an alternative to GET along with XSRF tokens. What data do you have to support the claims in the draft Abstract? What percentage of requests have request bodies, and further have request bodies that are sufficiently large that it is excessively wasteful to resend the entire representation? (and when safe to do so!) For high-quality wired networks, interrupted data transfers are less common, though more possible over long-distance links. For wireless and mobile, interruptions may be more common, e.g. while uploading pictures and videos. Now, it is true that non-idempotent requests such as POST and PUT are not generically safe to automatically retry upon failure. If you are trying to come up with a generic solution to recover a non-idempotent request, that should be more explicit and better scoped in the draft than potentially extending multiple existing HTTP request methods. Such a goal would require specifying that a server not start processing the upload in any non-idempotent way until the upload was complete. Other requirements might also be necessary. Using WebDAV HTTP methods for upload: I see two categories of targets for large uploads: 1. uploading to a target where the target is a resource 2. uploading to a target where the target is an endpoint (e.g. script which may process the upload) The first is already possible using WebDAV (explained below, yet again). The second can be implemented by an application, and IMHO should not require any changes to HTTP servers and proxies. More specifically, tus-v2 should not require new resource management by HTTP servers and proxies, instead delegating that management to specific user applications. Additionally, the second item might be implemented using a similar WebDAV solution as the first: RFC 9110 HTTP Semantics 14.5. Partial PUT https://httpwg.org/specs/rfc9110.html#partial.PUT notes that Partial PUT may be implemented by an HTTP server for some resources. lighttpd 1.4.65+ allows Partial PUT safely with config: webdav.opts += ("partial-put-copy-modify" => "enable") This includes extending files. (This is safe in earlier versions of lighttpd, too, but only if the targets are uniquely named so as to not possibly be in the process of being downloaded by other clients, i.e. temporary files.) Using lighttpd mod_webdav and the WebDAV protocol, a client can incrementally upload to a temporary file, and then rename the file when the upload is complete. The client could also DELETE the temporary file to cancel. A client uploading to an endpoint might upload the request body to an alternate location on the same server, and when the upload is complete, send a request to the endpoint with a request header containing the path to the completed upload of the request body. Here is an example set of pseudo-HTTP requests, uploading a file in 256k chunks, and recovering from a disconnect: LOCK /file.XXXXXX HTTP/1.1 201 Created ETag: "aaaaaa" PUT /file.XXXXXX HTTP/1.1 Content-Range: bytes 0-262143/262144 If-Match: "aaaaaa" 204 No Content ETag: "bbbbbb" PUT /file.XXXXXX HTTP/1.1 Content-Range: bytes 262144-524287/524288 If-Match: "bbbbbb" <disconnect> # (recovery resynchronization if disconnect occurs) HEAD /file.XXXXXX HTTP/1.1 200 OK Content-Length: 262144 ETag: "bbbbbb" PUT /file.XXXXXX HTTP/1.1 Content-Range: bytes 262144-524287/524288 If-Match: "bbbbbb" 204 No Content ETag: "cccccc" # (... further PUT to append additional blocks ...) # side-effect of MOVE does equivalent of UNLOCK /file.XXXXXX in lighttpd MOVE /file.XXXXXX HTTP/1.1 Destination: /file 201 Created > 2. Media types > > PATCH currently doesn’t define a media type. We went through the list of media types but couldn’t find the appropriate category for the Upload Appending Procedure. It is a generic byte-appending operation that can modify any types of media, so we don’t think it fits into an application media type. If tus-v2 is going to use PATCH: Why is tus-v2 not handled as PATCH with media-type application/tus-v2? tus-v2 is an application protocol. Content-Type: application/tus-v2 along with tus-v2 request headers would indicate how the request body is treated by PATCH implementations, if they support application/tus-v2. > 3. 1xx intermediate response > > We surveyed the most popular HTTP libraries in many languages, and nearly all of them consider 1xx responses an internal signaling mechanism so they don’t expose the ability for applications to handle them. (We are also guilty of this as maintainers of URLSession API on Apple platforms.) If we use 1xx response for any critical information, it would prevent nearly all tus-v1 adopters to switch to this new protocol until it’s natively supported in HTTP libraries. Multiple 1xx HTTP responses may be sent by an HTTP server before sending the final HTTP response. Not all existing HTTP servers support this, and there may be security and resource implications. lighttpd 1.4.56 and later forward 1xx responses from a backend to the client, but can be configured to ignore 1xx responses (besides 101 Switching Protocols) from backends if site security policy dictates. In short, an application behind lighttpd could send an additional "100 Continue" with Upload-Token response header. Client HTTP libraries already need to be extended to support tus-v2, so would access to 1xx response headers be unworkable where a new 104 HTTP status would succeed? If client HTTP libraries do not have a callback or some other interface for applications to receive HTTP 1xx intermediate responses, that would need to be added for tus-v2 feature detection, wouldn't it? > We think having just the feature detection part using 1xx response is a good balance, both eliminating any extra round trips for HTTP libraries implementing this protocol and allowing application adopters to ignore it. For a sufficiently large upload, clients should send Expect: 100-continue and the extra round-trip should be lost in the noise. Also, there are many reasons for Expect: 100-continue, among others: including verifying authn/authz, and upload size limits (i.e. if the server will reject a very large Content-Length) before beginning a large upload. If the client also sent a hypothetical Upload-Token: . then the origin server supporting resumable uploads could respond with 100 Continue Upload-Token: /uri/path/to/file.XXXXXX to indicate that it is storing the request body as a resource at that location, and the client may query and extend it using WebDAV HTTP methods should a disconnection occur. There could be additional headers which convey policy information such as how long the Upload-Token is valid, i.e. how long the server may store the temporary resource before deleting it. To avoid clients abusing this temporary storage and sharing the link with others, it may be advisable to limit access to the upload path to HEAD, PUT (partial PUT), PATCH, OPTIONS, and perhaps PROPFIND methods, and to reject GET, QUERY, and all other methods. If 100 Continue is not desirable for some reason, would it be possible to repurpose the WebDAV RFC 2518 response status 102 Processing? (removed in RFC 4918) http://www.iana.org/assignments/http-status-codes/http-status-codes.xhtml Why is a new status needed? (104 Upload Resumption Supported) I used "Upload-Token" above as that is in the tus draft, but it could be named something else. Also, the token could be an encrypted unique identifier, and the client could query/resume a disconnected request by providing the identifier to the endpoint along with request headers to indicate resumed upload, e.g. the tus resumable uploads protocol but application-specific, without the need for HTTP servers or proxies to know about this optional protocol which has application-specific policy for resource management of resumable uploads. > there isn’t a straightforward to mechanically change the URI to distinguish between attempts. Content-Location? > Looking forward to continuing the discussions and refinements of the draft. The draft fails to indicate why existing, standard WebDAV HTTP methods are not sufficient. I believe they are sufficient and have given examples above. The draft makes no mention of partial PUT and its potential shortcomings compared to PATCH. The draft does not distinguish between uploading a resource -- for which WebDAV methods are already a viable solution -- and uploading to an endpoint -- for which WebDAV HTTP methods may be a viable solution. I urge IETF reviewers to strongly recommend that the optional tus-v2 protocol be implementable by clients and server-side applications without requiring new support beyond existing standards (e.g. 1xx informational responses and WebDAV) from HTTP servers and proxies. Thank you, Glenn
Received on Sunday, 19 June 2022 06:00:15 UTC