Re: Proposal: Adopt State Synchronization into HTTPbis from Josh Cohen on 2024-10-09 (ietf-http-wg@w3.org from October to December 2024)

From: Josh Cohen <joshco@gmail.com>
Date: Wed, 9 Oct 2024 18:29:58 -0400
To: Michael Toomim <toomim@gmail.com>
Cc: Marius Kleidl <marius@transloadit.com>, Watson Ladd <watsonbladd@gmail.com>, ietf-http-wg@w3.org
Message-ID: <CAF3KT4SpDwMGxkCZPZs9ityhgb4CbMVosQGYGW3Djz=sYv=6RA@mail.gmail.com>
I support adoption of this scope into httpbis. However, I think it's useful
to engage in separation of concerns.

At a high level, there are
* The Versioning and Updates - the DAVish parts, which also have separable
concerns.
* PubSub

Within PubSub, there are also separable concerns.  In my email comparing
the different pubsub proposals[1], I described them as
* Subscription Setup - The methods use to set up a subscription
* Event Channel - How events are delivered to the subscriber
* Event Payload - What the content of the events are.
* Discovery - Discovery of PubSub features

If the WG adopts the work, then for each of these concerns, we can converge
on a solution (given the existing proposals), and possibly allow for
multiple choices, depending on circumstances.

Multiresponse is just one possibility for the Event Channel concern.   I
share Watson's concerns, and worry that it may be too much of a
fundamental shift given deployed infrastructure. However, it seems like
we're getting wrapped around the axle on that.  There are other options for
the Event Channel.

I'd like to put another on the table, which I will refer to as "Symmetric
HTTP"

The hybi working group, which according to datatracker lived between 2010
and 2015-ish.  It's charter says:

> The BiDirectional or Server-Initiated HTTP (HyBi) working group defines
> the WebSocket Protocol, a technology for bidirectional communication
> between an HTTP client and an HTTP server that
> provides greater efficiency than previous approaches (e.g., use of hanging
> requests or long polling).


The Websocket protocol RFC 6455, published in December 2011 says:

> The WebSocket Protocol is designed to supersede existing bidirectional
> communication technologies that use HTTP as a transport layer to benefit
> from existing infrastructure (proxies, filtering, authentication).  Such
> technologies were implemented as trade-offs between efficiency and
> reliability because HTTP was not initially meant to be used for
> bidirectional communication (see [RFC6202] for further discussion).


The world has evolved, and now with h2/h3, we have the ability for the
server to initiate and open a stream to the client that has connected to
the server. If we assume that the client/browser can have a small web
server engine running, then the Event Channel can just be HTTP requests
(say using NOTIFY method) sent downstream from the server to the client.

Assume:
We are interested in being notified about changes to resource
https://braid.org/@josh
Assume the client's HTTP engine can be named urn:braid:peer/receiver

The client can set up the subscription

SUBSCRIBE /@josh
Callback: urn:braid:peer/receiver

When changes to the resource occur, the server can use h2/h3 to initiate a
stream downstream to the client and send NOTIFY messages to the client

NOTIFY urn:braid:peer/receiver?+https://braid.org/@josh
Content-Type: application/json
Content-Length: 64
-
[{"text": "Hi, everyone!",
  "author": {"link": "/user/tommy"}}]

This approach avoids the concerns Watson raised. Instead it just "turns
around" HTTP.   Instead of each application trying to define its own
marshaling scheme either in an endless streaming HTTP response, or on top
of WebSockets, it leverages the features of HTTP we have.

There is a naming concern with respect to resources on the client HTTP
engine and how to indicate what resource has changed.  That could be in an
HTTP header in the NOTIFY, something like the URN scheme I've laid out, or
something we come up with.


[1] https://lists.w3.org/Archives/Public/ietf-http-wg/2024JulSep/0159.html

On Wed, Oct 9, 2024 at 5:39 PM Michael Toomim <toomim@gmail.com> wrote:

> Thank you, Marius! These are good questions about how to format a
> Multiresponse:
> On 10/9/24 12:20 AM, Marius Kleidl wrote:
>
> Regarding your example, Michael: Does the response body, which contains
> the updates, adapt its syntax to the used HTTP protocol? Do you suggest
> that subscriptions over HTTP/2 generate the updates as additional HTTP/2
> responses? If so, this would require subscriptions to be implemented inside
> the HTTP client itself instead of being a feature that a user can implement
> based upon existing and available HTTP clients. In addition, this raises
> questions about how to handle situations where the used protocol changes as
> requests and responses are forwarded through proxies and gateways.
>
> Yes, we would ideally format an H2 Multiresponse with native H2 frames.
> Here's an equivalent H2 version of my last H1 example:
>
>   ┌─────────┐
>   │ HEADERS │ :method = GET
>   │ Frame   │ :path = /chat
>   │         │ subscribe = timeout=10s
>   └─────────┘
>        │
>        ▼
>   ┌─────────┐
>   │ HEADERS │ :status = 104
>   │ Frame   │ subscribe = timeout=10s
>   │         │ current-version = "3"
>   └─────────┘
>        │
>        ▼
>   ┌─────────┐
>   │ HEADERS │ :status = 200
>   │ Frame   │ version = "2"
>   │         │ parents = "1a", "1b"
>   │         │ content-type = application/json
>   └─────────┘
>        │
>        ▼
>   ┌─────────┐
>   │  DATA   │ [{"text": "Hi, everyone!",
>   │ Frame   │   "author": {"link": "/user/tommy"}}]
>   └─────────┘
>        │
>        ▼
>   ┌─────────┐
>   │ HEADERS │ :status = 200
>   │ Frame   │ version = "3"
>   │         │ parents = "2"
>   │         │ content-type = application/json
>   │         │ merge-type = sync9
>   └─────────┘
>        │
>        ▼
>   ┌─────────┐
>   │  DATA   │ [{"text": "Hi, everyone!",
>   │ Frame   │   "author": {"link": "/user/tommy"}}
>   │         │  {"text": "Yo!",
>   │         │   "author": {"link": "/user/yobot"}]
>   └─────────┘
>
> You're absolutely right that this native version requires the client to be
> upgraded, and for proxies along the path to at least not interfere. If they
> don't, we can always fall back to the H1-style "shove it into the body"
> method. The trick is to know when it's safe to upgrade. We're currently
> extending cache-tests.fyi with some experiments to determine the best way
> to do this.
>
> I also wonder if we could reuse multipart responses for delivering
> updates. The response to a subscription request would be a streamed
> multipart response where each "part" is one update. The update can include
> header fields as well as content, similar to your example. Status codes in
> the update would not be directly possible, but I'm not sure if that's a big
> loss.
>
> Ah, yes this idea comes up frequently. The problem is that multipart
> relies on boundary conditions, which can be spoofed. Imagine an attacker
> learns that client C is getting updates streamed with boundary separator
> "====foo-bar-baz====". He can then try to find a way to mutate the
> resource in such a way to include that boundary separator in an update
> being sent to the client, and thus sneak fake data in.
>
> All in all, I enjoy the idea, but think that we can achieve this already
> with the existing features we have.
>
> I'm glad. I enjoy the idea, too. We're re-using features where possible.
> However, although SSE provides updates, it doesn't provide the semantics of
> "the resource is changing state", and doesn't even support binary, so it
> won't work for updating images. Multipart is tempting to re-use in
> Multiresponses, but has the boundary issue.
>
> Michael
>


-- 

---
*Josh Co*hen
Received on Wednesday, 9 October 2024 22:30:15 UTC