- From: Austin William Wright <aaa@bzfx.net>
- Date: Fri, 5 Aug 2022 19:26:05 -0700
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: ietf-http-wg@w3.org
- Message-Id: <6D3FBD73-EB55-4BC3-9874-F2FF154A6A7A@bzfx.net>
> On Aug 3, 2022, at 23:56, Julian Reschke <julian.reschke@gmx.de> wrote: > > Am 03.08.2022 um 22:36 schrieb Austin William Wright: >> ... >> Hi Julian, >> >> I don’t have terribly strong opinions about what the format is, but this is what I thought was obvious based on a few motivations: >> >> 1. It re-uses an existing parser (it’s an off-the-shelf HTTP-message but skipping the "start-line CRLF” beginning, and is trivially parsable with a state machine or regular expression [1]). >> ... > > That parser might be *present* everywhere, but not accessible. For > instance, a Java servlet engine happily processes HTTP/1.1 messages, but > doesn't provide an API to use that parser directly. This is indeed an unfortunate state of many Web app environments; but if you’re a client, then this format is trivial to generate (ASCII text prepended to a binary blob), and if you’re a server, then you likely already have access to parsers like multipart/form-data which should be similar at the very least. Or just throw it into a few regular expressions: one to read the size of the fields and validate they are well-formed; and if that matches, one to get the values for Content-Range, Content-Length, and Content-Type. I’m working on a simple proof-of-concept in Node.js that I can share soon. >> 2. It allows segments to express HTTP semantics; for example, creating a resource relies on attaching a Content-Type field. You might even attach a Digest field indicating the expected hash of the final resource. > > That should be on the request itself, no? Content-* headers cannot be on the request itself because there, they describe the patch, rather than the resource being patched. And specifically, the Content-Type must be nested inside the patch; the request itself will have either “Content-Type: message/byterange” or “Content-Type: multiupart/byteranges" >> 3. It allows for some future extensions (if you omit the “Content-Range” field, you can use a different one to specify the target range). >> >> Would a binary format be able to accomplish this? I know there’s been some work on a binary HTTP message framing but I’m not up-to-date on this. > > <https://datatracker.ietf.org/doc/draft-ietf-httpbis-binary-message/> It’s not immediately obvious to me how to decipher the grammar, but I’ll take a closer look and see if this can either be adapted or directly referenced. Thanks! >> [1] The regular expression for matching the fields a message/byterange document in draft-wright-http-patch-byterange-00 is, excluding obs- productions, is exactly: >> >> /^([!\x23-'\x2a\x2b\x2d\x2e0-9A-Z\x5e-z\x7c~]+:[\t ]*(?:[!-~](?:[\t -~]+[!-~])?)*[\t ]*\r\n)*\r\n/ > > Consider me sceptic. But I would need to dig in deeper to actually > check. It would be bad to have a format that looks like HTTP/1.1 but > then actually is slightly different. Well if that regexp is wrong, then it would be a bug in the tool I put together to generate it. Generally, non-recursive ABNF, regular expressions, and state machines are all the same thing. Thanks, Austin
Received on Saturday, 6 August 2022 02:26:19 UTC