Re: Byte range PATCH from Austin William Wright on 2022-08-06 (ietf-http-wg@w3.org from July to September 2022)

From: Austin William Wright <aaa@bzfx.net>
Date: Fri, 5 Aug 2022 19:26:05 -0700
To: Julian Reschke <julian.reschke@gmx.de>
Cc: ietf-http-wg@w3.org
Message-Id: <6D3FBD73-EB55-4BC3-9874-F2FF154A6A7A@bzfx.net>

> On Aug 3, 2022, at 23:56, Julian Reschke <julian.reschke@gmx.de> wrote:
> 
> Am 03.08.2022 um 22:36 schrieb Austin William Wright:
>> ...
>> Hi Julian,
>> 
>> I don’t have terribly strong opinions about what the format is, but this is what I thought was obvious based on a few motivations:
>> 
>> 1. It re-uses an existing parser (it’s an off-the-shelf HTTP-message but skipping the "start-line CRLF” beginning, and is trivially parsable with a state machine or regular expression [1]).
>> ...
> 
> That parser might be *present* everywhere, but not accessible. For
> instance, a Java servlet engine happily processes HTTP/1.1 messages, but
> doesn't provide an API to use that parser directly.

This is indeed an unfortunate state of many Web app environments; but if you’re a client, then this format is trivial to generate (ASCII text prepended to a binary blob), and if you’re a server, then you likely already have access to parsers like multipart/form-data which should be similar at the very least.

Or just throw it into a few regular expressions: one to read the size of the fields and validate they are well-formed; and if that matches, one to get the values for Content-Range, Content-Length, and Content-Type. I’m working on a simple proof-of-concept in Node.js that I can share soon.

>> 2. It allows segments to express HTTP semantics; for example, creating a resource relies on attaching a Content-Type field. You might even attach a Digest field indicating the expected hash of the final resource.
> 
> That should be on the request itself, no?

Content-* headers cannot be on the request itself because there, they describe the patch, rather than the resource being patched.

And specifically, the Content-Type must be nested inside the patch; the request itself will have either “Content-Type: message/byterange” or “Content-Type: multiupart/byteranges"

>> 3. It allows for some future extensions (if you omit the “Content-Range” field, you can use a different one to specify the target range).
>> 
>> Would a binary format be able to accomplish this? I know there’s been some work on a binary HTTP message framing but I’m not up-to-date on this.
> 
> <https://datatracker.ietf.org/doc/draft-ietf-httpbis-binary-message/>

It’s not immediately obvious to me how to decipher the grammar, but I’ll take a closer look and see if this can either be adapted or directly referenced. Thanks!

>> [1] The regular expression for matching the fields a message/byterange document in draft-wright-http-patch-byterange-00 is, excluding obs- productions, is exactly:
>> 
>> /^([!\x23-'\x2a\x2b\x2d\x2e0-9A-Z\x5e-z\x7c~]+:[\t ]*(?:[!-~](?:[\t -~]+[!-~])?)*[\t ]*\r\n)*\r\n/
> 
> Consider me sceptic. But I would need to dig in deeper to actually
> check. It would be bad to have a format that looks like HTTP/1.1 but
> then actually is slightly different.

Well if that regexp is wrong, then it would be a bug in the tool I put together to generate it. Generally, non-recursive ABNF, regular expressions, and state machines are all the same thing.

Thanks,

Austin

Received on Saturday, 6 August 2022 02:26:19 UTC