Re: Proposal: Adopt State Synchronization into HTTPbis from Watson Ladd on 2024-10-09 (ietf-http-wg@w3.org from October to December 2024)

From: Watson Ladd <watsonbladd@gmail.com>
Date: Wed, 9 Oct 2024 09:50:33 -0700
To: Michael Toomim <toomim@gmail.com>
Cc: ietf-http-wg@w3.org
Message-ID: <CACsn0ckj4V+NLp413uVd21nv4Cah63dE9vtNib3TfWXu3DRsOA@mail.gmail.com>
On Tue, Oct 8, 2024 at 4:16 PM Michael Toomim <toomim@gmail.com> wrote:
>
> Josh Cohen and I are considering publishing a new draft on Subscriptions in HTTP, and as we think through the big design decisions, I ran across this excellent question from Watson Ladd bringing up the most fundamental question of them all:
>
> On 11/6/23 4:47 PM, Watson Ladd wrote:
>
> On Tue, Oct 31, 2023 at 7:12 PM Michael Toomim <toomim@gmail.com> wrote:
>
> At IETF 118 I will present a proposal to adopt State Synchronization work into HTTPbis:
>
> Braid-HTTP: https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-braid-http [1]
>
> <...snip...>
>
> The big sticking point for me is subscriptions. This is a deviation
> from the request/response paradigm that goes pretty deep into how
> clients and servers are coded and the libraries they use. It can of
> course be stuck on top of WebTransport, which might be the right way
> to do it, but then doesn't integrate with the other three parts.
>
> You might be better trying to layer this on top of HTTP and
> WebTransport, as ugly as that can be with regard to what
> intermediaries can do in order to get it into the hands of people
> faster, but if there's some strong reason not to do that I'm all ears.
>
> Watson raises a basic choice in designing Subscriptions:
>
> Do we dare extend the basic request/response model to allow long-lived subscriptions?
> Or are subscriptions better layered on top— inside a WebSocket, or WebTransport connection?
>
> I argue that (1) is actually a *much* better choice. Yes, it is fundamental. It extends the basic architecture of HTTP (hat tip to Roy Fielding) by extending REST into RESS (REpresentational State Synchronization). It adds a new basic feature to HTTP— the ability to subscribe to any resource, and get notified of its changes over time; throughout the entire web and HTTP ecosystem. Clients will stop guessing whether to reload cache, and will stop making redundant requests. Servers—which *authoritatively know* when resources change—will promise to tell clients, automatically, and optimally. Terabytes of bandwidth will be saved. Millions of lines of cache invalidation logic will be eliminated. Quadrillions of dirty-cache bugs will disappear. In time, web browser "reload" buttons will become obsolete, across the face of earth.

That assumes deployment and that this works pretty universally. I'm
less sanguine about the odds of success.

This happy state holds after every single intermediary and HTTP
library is modified to change a basic request-response invariant. Most
won't be. In any protocol this age there are some invariants that have
crept in and become ossified and for HTTP 1/1 that's the request
response model. Pipelining doesn't necessarily work all the time, let
alone the interleaving you need to be efficient with TCP sockets and
this. Servers have a deeply ingrained idea that they don't need to
hold long lived resources for a request. It's going to be hard to
change that, and some assets will change meaningfully for clients
outside of the duration of a TCP connection (think e.g. NAT, etc).

Caches, particularly client caches, outlive processes.

Subscriptions are push based, HTTP requests are pull based. Pulls
scale better: clients can do a distributed backoff, understand that
they are missing information, recover from losing it. Push might be
faster in the happy case, but it is complex to do right. The cache
invalidation logic remains: determining a new version must be pushed
to clients is the same as saying "oh, we must clear caches because
front.jpg changed". We already have a lot of cache control and HEAD to
try to prevent large transfers of unchanged information. A
subscription might reduce some of this, but when the subscription
stops, the client has to check back in, which is just as expensive as
a HEAD.

>
> The alternative (2) is to add subscriptions on top of a WebSocket or WebTransport API, separate from HTTP resource semantics. But then HTTP resources themselves will not be subscribable. Programmers will be annoyed, and limit their use of HTTP to bootstrapping the initial HTML, CSS and Javascript; migrating all the interesting state onto this separate WebSocket or WebTransport mechanism, which will then require more and more of HTTP's features being added back into it: like (a) being able to GET the state, and also PUT changes to the it, over (b) multiple content-types (e.g. text/json, text/html, and image/png), while (c) supporting various PATCH types, across (d) a full-featured network of Proxies, Caches, and CDNs to scale the network. In conclusion, choice (2) leads to reinventing HTTP, within a WebSocket/Transport... on top of HTTP.

I don't really understand the class of applications for which this is
useful. Some like chat programs/multiuser editors I get: this would be
a neat way to get the state of the room. It also isn't clear to me
that intermediaries can do anything on seeing a PATCH propagating up
or a PUT: still has to go to the application to determine what the
impact of the change to the state is.

>
> The clear way forward is subscribing directly to HTTP and REST state. An elegant way to see this is extending Request/Response into Request/Multiresponse. The Subscription can be a Request that receives multiple Responses, one for each update to the resource. There are many ways to format a Multiresponse; here's a straightforward and backwards-compatible option:
>
>       Request:
>
>          GET /chat
>          Subscribe: timeout=10s
>
>       Response:
>
>          HTTP/1.1 104 Multiresponse
>          Subscribe: timeout=10s
>          Current-Version: "3"
>
>          HTTP/1.1 200 OK
>          Version: "2"
>          Parents: "1a", "1b"
>          Content-Type: application/json
>          Content-Length: 64
>
>          [{"text": "Hi, everyone!",
>            "author": {"link": "/user/tommy"}}]
>
>          HTTP/1.1 200 OK
>          Version: "3"
>          Parents: "2"
>          Content-Type: application/json
>          Merge-Type: sync9
>          Content-Length: 117
>
>          [{"text": "Hi, everyone!",
>            "author": {"link": "/user/tommy"}}
>           {"text": "Yo!",
>            "author": {"link": "/user/yobot"}]
>
> This is backwards-compatible because it encodes multiple responses into a regular response body that naïve intermediaries will just pass along blindly, like SSE. But upgraded H2 and H3 implementations can have native headers & body frames that repeat. It's all quite elegant. It fits right into HTTP. It feels as if HTTP was designed to make it possible.

*every security analyst snaps around like hungry dogs to a steak*
Another request smuggling vector?

It's HTTP 1.1 where this looks easy. Even there it isn't. How does a
busy proxy with lots of internal connection reuse distinguish updates
as it passes them around on a multiplexed connection? What does this
look like for QUIC and H/3?

>
> We can add subscriptions to the basic fabric of HTTP, and free application programmers from having to write cache-invalidation logic. This will (a) eliminate bugs and code complexity; while simultaneously (b) improving performance across the internet, and (c) giving end-users the functionality of a realtime web by default. This is a fundamental change, but it is overwhelming beneficial. Then we can update Roy's dissertation. It's a good one, and deserves our care.

We have (c): it's called WebSockets. What isn't it doing that it
should be? I'm sympathetic to fixing the foundations but there's lots
of complexity here that hasn't been addressed, and IMHO makes the
juice not worth the squeeze.
>
> Michael

Sincerely,
Watson
--
Astra mortemque praestare gradatim
Received on Wednesday, 9 October 2024 16:50:50 UTC