Re: Review of draft-toomim-httpbis-versions-00

HTTP requests tend to run through a lot more middleware than application-layer messages like WebSocket.  Every request is likely to be checked for compliance with various Web Application Firewall policies, passed through various caching gateways, logged to various request logs, etc.

I would recommend thinking carefully about whether the using individual HTTP requests, instead of an application-layer stream of some kind, is semantically correct and useful.

--Ben
________________________________
From: Michael Toomim <toomim@gmail.com>
Sent: Tuesday, July 23, 2024 3:25 PM
To: Marius Kleidl <marius@transloadit.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>; Braid <braid-http@googlegroups.com>; Peter van Hardenberg <pvh@pvh.ca>
Subject: Re: Review of draft-toomim-httpbis-versions-00

Yes, it works great for collaborative editing. I use it every day in production. It's very fast. We send a PUT per keystroke. I should show you a demo. It's real. :) It's not true that an HTTP PUT induces more load on the server than a WebSocket


Yes, it works great for collaborative editing. I use it every day in production. It's very fast. We send a PUT per keystroke. I should show you a demo. It's real. :)

It's not true that an HTTP PUT induces more load on the server than a WebSocket message. They are equivalent. Consider that both H2 and WebSocket are TCP streams that stay persistently open. The only difference between these two streams is how the data is formatted. They don't impact how/when the server loads the resource from disk into ram. It's true that HTTP requests often contain a session ID in a cookie on each request, whereas a WebSocket might only send that when the user logs in/out, but that header gets compressed down with H2 header compression and isn't a significant performance problem.

Perhaps you're thinking about old-style threaded web servers? Those have a lot of overhead per request, because a 4mb OS thread has to be allocated to each request. But those don't support persistent connections (like WebSockets) at all. That's why everyone's moved to evented servers, like nodejs, which make persistent connections cheap, whether formatted as a WebSocket message stream or a H2 message stream.

On 7/23/24 1:45 AM, Marius Kleidl wrote:
Hi Michael,

talking about performance, I am curious how it would perform in a real-time, collaborative editing process (similar to Google Docs or the note taker tool during the IETF meeting). To facilitate the real-time aspects of the editing experience, would the client have to send a PUT request after every few keystrokes, so that the changes appear quickly on the peers' screens? Sending these requests is comparatively cheap for the client, especially with HTTP/2 and HTTP/3, but potentially more costly for the server, which has to perform authentication checks for each request and then load the resource's state from some storage. If many requests are sent in short succession, this can induce a higher load on the server. A stateful connection, like with WebSockets, in contrast to stateless HTTP requests could reuse the loaded and checked state - although such a method likely has other caveats attached.

Overall my question is whether you think this draft is suitable to deliver such real-time experiences in an efficient manner?

Best regards
Marius Kleidl

On Tue, Jul 23, 2024 at 1:51 AM Michael Toomim <toomim@gmail.com<mailto:toomim@gmail.com>> wrote:

Peter, I just wrote up an explicit example of how to compress four PUTs into 7 bytes. Check out the new section 5.1 here:

https://github.com/braid-org/braid-spec/blob/master/draft-toomim-httpbis-versions-01.txt#L945

These four puts compress down to 0.0146% of their original size, at least in theory. Note that said compression scheme isn't fully specified in this draft — the focus of this draft is just to gather interest in working on a versioning system that makes such compression possible. The actual compression schemes would be future work.

On 7/22/24 12:41 PM, Michael Toomim wrote:

Peter, thank you for your interest! I'm excited that you are bringing up performance for discussion! There's a lot to say on that, and I give an overview below:

== Compression & Performance ==

First, let me correct a big misinterpretation— this work absolutely prioritizes high-performance, realtime data synchronization. It should support thousands of mutations per second. Our implementations are higher-performance<https://urldefense.com/v3/__https://josephg.com/blog/crdts-go-brrr/__;!!Bt8RZUm9aw!_RUMfwqpWpVLnqRN4aq9i_mIvXtIu7NwJr1L27f3P6xWmeeCeZtISx4Mh1cROOh36IkEer0vEQ$> than Automerge, for instance. I regularly work today with a doc composed of 110,000 edits. It loads instantly, thanks to some great Version-Types we've designed.

The Version-Type (in the proposed Version-Type header) is the way you get performance increases. The key to performance is managing history growth. You manage that by finding a pattern in history, and then compressing or ignoring history. You can express those patterns as a Version-Type spec. (There's a robust theory behind this called Time Machines.)

I apologize that this wasn't clear in the draft -00. I thought this would be an advanced feature that people wouldn't comment on for a bit — but am pleasantly surprised to hear your interest in it! I will be adding more clarity to the spec on Version-Types, and already have begun doing so in github:

https://github.com/braid-org/braid-spec/blob/master/draft-toomim-httpbis-versions-01.txt#L885

I'd also encourage you to check out this sketch of how to bake RLE into HTTP Header Compression:

https://braid.org/meeting-69/header-compression<https://urldefense.com/v3/__https://braid.org/meeting-69/header-compression__;!!Bt8RZUm9aw!_RUMfwqpWpVLnqRN4aq9i_mIvXtIu7NwJr1L27f3P6xWmeeCeZtISx4Mh1cROOh36InBo2iZCA$>
https://braid.org/video/https://invisiblecollege.s3.us-west-1.amazonaws.com/braid-meeting-69.mp4#4166<https://urldefense.com/v3/__https://braid.org/video/https:/*invisiblecollege.s3.us-west-1.amazonaws.com/braid-meeting-69.mp4*4166__;LyM!!Bt8RZUm9aw!_RUMfwqpWpVLnqRN4aq9i_mIvXtIu7NwJr1L27f3P6xWmeeCeZtISx4Mh1cROOh36Intse69MQ$>

In any case, keep in mind that at this stage, we need to know only whether there is interest in this area of work — not whether this particular spec meets your needs. If we adopt this work into the HTTP WG, we will get a chance to change or rewrite any part of the spec. This spec is just a starting point to get discussion going. So think of this as a problem statement rather than a solution statement.

== PUTs ==

As for PUTs, I suspect you might be thinking about HTTP/1.0 where each PUT might require a new TCP connection with its own TLS handshake. But keep in mind that with HTTP/2 and 3, all HTTP semantics are expressed in binary, and a PUT is usually just a single packet! This is just as efficient as any hand-rolled protocol you have, and it has the advantage of being interoperable with all of HTTP.

== History Retention ==

This versioning model supports Time Machines<https://urldefense.com/v3/__https://braid.org/time-machines__;!!Bt8RZUm9aw!_RUMfwqpWpVLnqRN4aq9i_mIvXtIu7NwJr1L27f3P6xWmeeCeZtISx4Mh1cROOh36IndCvpi3Q$>— the beauty of which is that peers become free to independently choose how much history to store. An archival peer can store the full history. A light client can store just the latest version (see the amazing Simpleton<https://urldefense.com/v3/__https://braid.org/simpleton__;!!Bt8RZUm9aw!_RUMfwqpWpVLnqRN4aq9i_mIvXtIu7NwJr1L27f3P6xWmeeCeZtISx4Mh1cROOh36ImGXoOQFg$> client, which needs zero history).

So each peer can choose how much history to store. If a peer doesn't have enough history to merge an edit, it can simply request that history from another peer. In this draft, you do so by requesting a GET with both Version and Parents headers specified.

== Signatures & Validation ==

This is out of scope for this proposal on versions. However, (a) there are some Version-Types that double as signatures. When this happens, it can be specified by authoring a Version-Type spec to articulate the new constraint. And (b) this is a generally important area of work that I encourage.

Cheers!

Michael

On 7/22/24 11:44 AM, Michael Toomim wrote:

We've got divergent discussion threads that I'm merging together.

First, Peter Van Hardenberg (of Ink & Switch, Local-First, and Automerge) wrote this initial review of the draft. He's cc'd, and we can respond in this thread.

------------------------
-- Peter Van Hardenberg: --
------------------------

Hi Michael,

I had a quick look at the spec and gave some thought to whether we'd want to adopt it. I think right now it has quite a lot of per-version overhead, and viewing this through a local-first lens, one can imagine having to publish a large number of versions each as separate PUT calls. You might want to consider supporting ranges for PUT in a single message.

Overall, our goals appear to differ from what you're proposing here so this feedback may not be particularly important. My sense is that the expected granularity of changes for Braid is relatively large and that the frequency is relatively long -- on par with a changed HTML form submission, perhaps. We spend quite a lot of our time thinking about optimizing updates for potentially thousands of edits and trying to minimize the number of round trips required to synchronize state in both directions. You mention that the design intends to be optimizable but I didn't see much in the text that clarified how.

One other observation is that this spec does not appear to prioritize retention of history:
>      - If the Parents header is absent, the server SHOULD return a
>      single response, containing the requested version of the resource
>      in its body, with the Version response header set to the same
>      version.
This design may centralize the system, as clients default to receiving "flattened" versions of resources and thus may not be able to merge changes from other sources.

Last, have you considered specifying some kind of signature / validation feature? If clients are applying patches iteratively, it might help for them to be able to validate that they're in the expected state either before or after applying a patch.

All the best,
-p

On 7/15/24 6:26 PM, Michael Toomim wrote:

Hi everyone in HTTP!

Last fall we solicited feedback on the Braid State Synchronization proposal [draft<https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-braid-http-04__;!!Bt8RZUm9aw!_RUMfwqpWpVLnqRN4aq9i_mIvXtIu7NwJr1L27f3P6xWmeeCeZtISx4Mh1cROOh36Ik6Pf2qDg$>, slides<https://urldefense.com/v3/__https://datatracker.ietf.org/meeting/118/materials/slides-118-httpbis-braid-http-add-synchronization-to-http-00__;!!Bt8RZUm9aw!_RUMfwqpWpVLnqRN4aq9i_mIvXtIu7NwJr1L27f3P6xWmeeCeZtISx4Mh1cROOh36IkZ20cL_w$>], which I'd summarize as:

"We're enthusiastic about the general work, but the proposal is too high-level. Break the spec up into multiple independent specs, and work bottom-up. Focus on concrete 'bits-on-the-wire'."

So I'm breaking the spec up, and have drafted up the first chunk for you. I would very much like your review on:
Versioning of HTTP Resources
draft-toomim-httpbis-versions
https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-versions-00<https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-versions-00__;!!Bt8RZUm9aw!_RUMfwqpWpVLnqRN4aq9i_mIvXtIu7NwJr1L27f3P6xWmeeCeZtISx4Mh1cROOh36InoUB5Psg$>
Versioning is necessary for state synchronization—and occurs in a range of HTTP systems:

  *   Caching
  *   Archiving
  *   Version Control
  *   Collaborative Editing

Today, HTTP has resource versions in the Last-Modified and ETag headers, and sometimes embeds versions in URLs, like with WebDAV. Each of these options serves some needs, but also has specific limitations. An improved general approach is proposed, which provides new features, that could enable cool new applications, such as incrementally-updated RSS feeds, and could simplify existing specifications, such as resumeable uploads, and history compression in OT/CRDT algorithms.

I would love to know if people find this work interesting. I think we could improve performance, interoperability, and be one step closer to having Google Docs power within HTTP URLs.

Michael

-------- Forwarded Message --------
Subject:        New Version Notification for draft-toomim-httpbis-versions-00.txt
Date:   Mon, 08 Jul 2024 11:02:11 -0700
From:   internet-drafts@ietf.org<mailto:internet-drafts@ietf.org>
To:     Michael Toomim <toomim@gmail.com><mailto:toomim@gmail.com>


A new version of Internet-Draft draft-toomim-httpbis-versions-00.txt has been
successfully submitted by Michael Toomim and posted to the
IETF repository.

Name: draft-toomim-httpbis-versions
Revision: 00
Title: HTTP Resource Versioning
Date: 2024-07-08
Group: Individual Submission
Pages: 19
URL: https://www.ietf.org/archive/id/draft-toomim-httpbis-versions-00.txt<https://urldefense.com/v3/__https://www.ietf.org/archive/id/draft-toomim-httpbis-versions-00.txt__;!!Bt8RZUm9aw!_RUMfwqpWpVLnqRN4aq9i_mIvXtIu7NwJr1L27f3P6xWmeeCeZtISx4Mh1cROOh36IlPRK3Ieg$>
Status: https://datatracker.ietf.org/doc/draft-toomim-httpbis-versions/<https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-toomim-httpbis-versions/__;!!Bt8RZUm9aw!_RUMfwqpWpVLnqRN4aq9i_mIvXtIu7NwJr1L27f3P6xWmeeCeZtISx4Mh1cROOh36IkBti4BOw$>
HTMLized: https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-versions<https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-versions__;!!Bt8RZUm9aw!_RUMfwqpWpVLnqRN4aq9i_mIvXtIu7NwJr1L27f3P6xWmeeCeZtISx4Mh1cROOh36Il_8PlqvA$>


Abstract:

HTTP resources change over time. Each change to a resource creates a
new "version" of its state. HTTP systems often need a way to
identify, read, write, navigate, and/or merge these versions, in
order to implement cache consistency, create history archives, settle
race conditions, request incremental updates to resources, interpret
incremental updates to versions, or implement distributed
collaborative editing algorithms.

This document analyzes existing methods of versioning in HTTP,
highlights limitations, and sketches a more general versioning
approach that can enable new use-cases for HTTP.



The IETF Secretariat

Received on Wednesday, 24 July 2024 03:57:36 UTC