Indeterminate-length partial content messages

Hello HTTP WG,

HTTP APIs recently adopted my “Byte Range Patch” draft, a media type to overwrite or append to a resource, especially when Partial PUT is not suitable (e.g. when server support is undetermined, or when the patch must be exchanged as a file). It re-uses standard HTTP fields including Content-Range, but notably this field doesn’t support ranges of indeterminate length, so there's no way to encode an indefinitely long write at a specific offset—you must know the length of the write when you begin the request.

Currently, as a workaround, the draft specifies a special case for the Content-Range syntax. Since workarounds like this are questionable, I’ll specify a different field name (perhaps "Content-Offset"). However, this brings its own problems: There would be a field that only functions in the body of these PATCH requests, and isn’t used in HTTP headers or any other context. While it's plausible for Byte Range PATCH to abandon the HTTP field design, I think it may be simpler on the whole to share semantics with Range requests and multipart messages. And further, standardizing partial content offsets would have greater utility, including in 206 Partial Content responses, and synchronization more broadly.

This problem was previously observed in streaming media, treated by the experimental RFC 8673 <https://www.rfc-editor.org/rfc/rfc8673.html> (HTTP Random Access and Live Content). This RFC suggests using a very large Range endpoint to request the server stream content as it becomes available. Then, it uses this large number in the response Content-Range to indicate that the response is of indeterminate length.

While this solution caters to the requirements of Range requests, it cannot be used for uploads, where no Range field is used. And if a client sends “Content-Range: bytes 100-999999999999” in a request, but ends the stream with less content than that, this should only be seen as an error, not as an understanding that the exact length was unknown. So I think a different, more general solution is warranted.

This would also serve as an important building block for synchronization over HTTP, since this could be used in Partial PUT <https://www.rfc-editor.org/rfc/rfc9110.html#name-partial-put> or a Byte Range PATCH <https://datatracker.ietf.org/doc/draft-ietf-httpapi-patch-byterange/> to append to shift buffers <https://www.rfc-editor.org/rfc/rfc8673.html#name-shift-buffer-representation>, and other “live resources” whose content is not entirely known at request-time, but may be streamed as it becomes available (as opposed to transferring only the data defined at the time of the request). Features that could build on top of this may include:

- Indicating support for indeterminate-length 206 (Partial Content) responses.
- Indicating preference for “stream live data” versus “snapshot-at-request-time” messages.
- Managing sparse resources, including shift buffers and more complicated synchronization (e.g. multiple clients uploading to the same resource in parallel).
- Optimizing caching for shift buffers (e.g. indicate that content may grow and/or become forgotten, but does not change once defined).
- Subscribing to changes to an underlying resource (in realtime or as desired).

The two most obvious ways to define this feature would be (1) to extend Content-Range to a form like “bytes 10-*/*” (where the star indicates exact value unknown), or (2) a new header like “Content-Offset: 10” (or what Resumable Uploads calls “Upload-Offset” <https://httpwg.org/http-extensions/draft-ietf-httpbis-resumable-upload.html#name-upload-offset>).

I would like to propose "Content-Offset = sf-integer”, since there's a certain symmetry to it (other HTTP fields that change when the message is of indeterminate length). Though, in cachable responses, modifying Content-Range may be desirable instead, depending on how origins want to non-implementing caches to act. While some amount of compatibility must be considered (especially caches), I feel this is a problem that will spawn domain-specific solutions over and over until there’s a general solution.

Please send me feedback on this proposal, I would especially like to hear from anyone with experience implementing HTTP Random Access and Live Content, or with any of the use cases I’m describing here. Then if this seems reasonable, I’d can draft an experimental I-D.

Thanks,

Austin Wright.

Received on Friday, 3 November 2023 04:26:40 UTC