Re: Publication of the Mercure Protocol

Hi Michael,

Thanks for your interest and for the detailed review!

I stumbled upon the BRAID I-D very recently and I planned to open an issue
on your GitHub repository to bring Mercure to your attention. I'll be happy
to collaborate with you to make both systems interoperable.

BRAID looks very interesting and I'm in the process of reviewing it
thoroughly. I love the idea of using forks and merges.
Indeed there is a tiny overlap between Mercure and BRAID, but I think that
both specs more complement each other than they overlap.

If I follow well, Mercure and BRAID overlap mainly with the "2.
Subscriptions to GET requests" section, and a bit (but not really actually)
with "1. Versioning to HTTP resources". Features described in the other
sections (which are awesome) could be used almost directly with Mercure as
transport.

At a glance, I think that the goals for BRAID section 2 and Mercure aren't
exactly the same, however I'm sure there is a room for convergence and to
find a consensus. Mercure goals are:

   - To be as close as possible from the WebSub spec (https://www.w3.org/TR/
   websub/ <https://www.w3.org/TR/websub/>), and to be able to use it in
   addition to WebSub (use WebSub for server-to-server communications, use
   Mercure for server-to-client). By the way, the reference Mercure Hub
   will soon support dispatching "updates" using both Mercure and WebSub
   (as explicitly allowd in the I-D). Most terms (but not all) are imported
   from this spec. Some other from the SSE spec.
   - To be usable right now, by capitalizing on the existing web platform
   as much as possible (it's why it uses EventSource and JWT), and without
   having to modify web browsers (and it's a success so far, they are already
   plenty of tools and frameworks supporting Mercure, and Firefox is adding
   support for it in the dev tools). Another benefit of reusing SSE (I was
   very hesitating about that), is that they are good quality SSE clients
   already available for almost all major programming languages.
   - Related: to work out of the box in modern browsers without having to
   download a SDK / a JS lib.
   - To be usable with technologies not able to maintain long-lived
   connections such as PHP, CGI and Serverless (it's why it imports the
   concept of "hub" from the WebSub spec).
   - To be compatible with HTTP 1, even when subscribing to updates for
   several resources (it's why you can subscribe to several "topics" at the
   same time, in the same HTTP connection). Actually this approach also has
   benefits when using HTTP/2 and 3 because the client can save some H2/H3
   streams and the server can reduce the number of threads or similar
   constructs it has to run.
   - To be fully-featured, to be able to be used as a replacement of (and
   even as a transport for) GraphQL subscriptions, but also of tools such
   as socket.io. It's why it has an authorization mechanism, the system of
   topic selectors and the presence API (most of these features have been
   designed after having gathered feedback from users, especially from the
   Symfony community as the framework adopted Mercure very early, and later
   from the JS community).

I hope we can find a way to make both I-D complementing each other, or even
to merge them. My first thought would be to replace or complement BRAID's
section 2 by a reference to the Mercure Draft, WebSub and/or any other
already existing or future applicable mechanism, and to recommend to use
all other features (especially versions and patches) provided by BRAID in
the Mercure Draft. I need to think more about this, but I've the feeling
that splitting the "transport" layer from the rest of BRAID and the
"version"/Last-Event-ID part of Mercure from the rest of it could be
mutually beneficial.

After this contextualization, I'll try to answer all points you raised:

> 1. Combine Discovery and Subscription into a single request/response

We considered doing this, (in a similar fashion to what you propose in
BRAID), but so has several drawbacks:

   - It's not compatible with SSE, and so will make the adoption harder and
   make implementation of both clients and servers more difficult. Such
   approach would require all browsers, proxies, API Gateways etc to implement
   this protocol change. Regarding browsers, a polyfill could maybe be
   written, but I'm not sure it is worth it: SSE is already widely
   implemented, battle-tested, and in my opinion good enough for most use
   cases.
   - As stated in the goals, many popular technologies (including PHP,
   Serverless...) aren't well suited to maintain persistent connections,
   the discovery mechanism and the hub allow to very easily use dedicated
   software (a hub) to handle the persistent connection. It allows to add push
   capabilities to existing and new serverless / PHP / CGI applications in
   a straightforward way.
   - Even when using  technologies better suited to handle long-lived
   connections such as Go, Java or Node.js, it's often a better solution to
   use dedicated hardware to handle such connections (the CPU and memory
   consumption will not follow the same patterns than when serving typical -
   short-lived - web pages or APIs responses, the number of open connections
   is also usually higher etc). It's also often not possible to configure
   properly and securely the same software to serve both short-lived and
   long-lived responses (for instance Go as well as many other programming
   languages, libraries, web servers and proxy servers don't allow to set
   timeouts per request, it's for the whole server).
   - Using a hub to dispatch updates is already a broadly accepted
   practice. It's standardized as part of WebSub, and very common when
   using WebPush for mobile apps.

Also, as stated in the previous section, using a "hub" allows to stream all
updates for all resources in a single HTTP connection, and provide a better
compatibility with HTTP 1. This also a large benefit when using very
granular HTTP APIs (no compound documents, a URL per retrieved resource)
such as what is proposed by the Vulcain (https://github.com/dunglas/vulcain)
and Prefer-Push (https://tools.ietf.org/html/draft-pot-prefer-push-01)
Internet-Drafts which are more and more popular (https://apisyouwonthate
.com/blog/rest-and-hypermedia-in-2019).

Regarding the concepts (hub, topic, subscriber, publisher...), actually
they aren't new, they have been imported "as-is" from the W3C's WebSub
specification (formerly known as PubSubHubbub), which is already widely
used and part of the web stack. Most definition are copy/pasted/adapted
from this specification. Both specifications are very consistent and
designed to be used together, as stated in the I-D.

> 2. Ubundle Authorization

As you stated, the authorization mechanism has to be specified because of
the hub concept. Also, the standard EventSource API has some limitations
(it's is able to send cookies, but not Authorization headers). We early
discovered that is was very important for interoperability to have a
standard mechanism for authorization, especially for subscribers, and to a
lesser extent for publishers (when the mechanism is specified for
subscribers, there is no additional complexity to allow publishers to use
it too).
That being said, I totally agree that it's important to allow other
authorization mechanism. It's why the implementation of this part is
optional, and that the spec allows to use other authorization mechanisms
(in addition or in replacement of the specified one). Maybe could we make
this broader, or maybe could we split the authorization in another RFC, but
according to the feedback we gathered early, it's very common and very
important for end users to provide a mechanism allowing to dispatch private
updates for resources. I've no strong opinion about this (separate RFC,
optional section...), making Mercure compatible with BRAID will definitely
be a good exercise to be sure both specs are generic enough, and I'm very
excited about working on this topic.

> Details

> So which is it: a hub/server, or a full resource with a path?

Actually, a hub is a server (it can be the same server than the one serving
other resources, or a dedicated one) exposing some URL (the "/.well-known/
mercure" endpoint but also the resources exposed by the presence API). The
WebSub specification defines the hub like this:

>  The server (URL [URL]) which implements both sides of this protocol. Any
hub MAY implement its own policies on who can use it.
> [...]
> A WebSub Hub is an implementation that handles subscription requests and
distributes the content to subscribers when the corresponding topic URL has
been updated

https://www.w3.org/TR/websub/#hub

We imported this definition, but I'll be glad to clarify (however, I think
it's important to use a consistent terminology with WebSub).

>  But the point of .well-known <https://tools.ietf.org/html/rfc8615> is
that you don't need to specify it anywhere, and you don't need the
discovery step. So I think you can eliminate this header entirely.

While it's totally OK to use a host the typical resources and the Mercure
hub on the same server/domain, in the wild the hub is very often on another
(sub-)domain (because of the ops-related benefits I explained above). The
typical setup is something like that:

   - https://example.com serves a typical HTTP APIs, responses have a Link
   header allowing clients to discover the hub(s) to use
   - https://hub.example.com hosts the Mercure hub (
   https://hub.example.com/.well-known/mercure
   <https://hub.example.com/.well-known/mercure>)

It's why the header is necessary even if we use a "well known" URL. The
"well known" URL is interesting only when the client already knows the URL
of the Mercure server, for instance, when the web browser is both a
publisher and a subscriber, as in this chat example: https://demo-chat.
mercure.rocks.

> Last-event-id / linear time / single writer

We decided to not specify explicitly this behavior. Mercure doesn't allow
to modify resources directly (you should use standard HTTP verbs on the
resource itself to do that). It allows to broadcast new versions of the
modified resource. Two behaviors are allowed by the spec:

   1. The publisher sets the ID of the update. Then the hub just uses it
   and forwards it to clients. It's up to the publisher to design carefully
   the IDs. (Hubs may or may not support setting custom IDs, but most existing
   hubs support this feature, the publisher can detect if this feature is
   supported or not).
   2. The hub auto-assigns an ID to the update (because the publisher
   hasn't set an ID itself). In this case, it's up to the implemented to deal
   with ordering (or lack of) when two updates for the same resource arrive at
   the exact same time. Some hub implementations use mutexes (and similar
   constructs) to ensure that there is only one message written at the same
   time. Some other delegate to third-party systems (Kafka, Redis Streams,
   Pulsar and Postgres as far as I know), and then inherits from the
   behavior and constraints of these systems.

> SSE stream ID


   - As all identifiers in the spec, the SSE ID can be a string or an IRI,
   and it's recommended to use an IRI (but not mandatory. We did that for
   consistency: all identifiers are IRI or strings.
   - We used "ID" for consistency with the SSE spec (actually the publisher
   can set all fields defined in the SSE spec, including ID but also retry,
   type etc).
   - Same for Mercure then, all IDs can be IRIs or strings.

> Reconnection

   - Yes, it has been requested by user for operational constraints (it
   allows hubs to flush there history at some point, when the disk is
   full...). But the client can detect if updates have been lost. To be
   honest, I've no strong opinion about that, while it's convenient from an
   ops point of view, it can be annoying for the client-side developer. I'm
   totally open to reconsider the wording.
   - Supporting a duration of time as you suggest sounds good to me!

> Active Subscriptions

Actually, it's a very recent addition to the spec. It has been the most
requested feature from day 1. Most users use Mercure a replacement for
things such as socket.io and Pusher (the proprietary service), and being
able to know (client-side and server-side) the list of subscribers to a
given topic is very common and useful. You can see how it allows a
"generic" Mercure client to discover the currently connected users in the
chat example I linked previously.
It's also very common for the "publisher" to have know the list of
subscribers to a given resource.

As for the "Authorization" part, it is entirely optional, and it could be
in a separate/additional RFC, but I think that it's very important to keep
a standard, recommended and interoperable way (while optional and
replaceable). It allows libraries (such as the Symfony components) to
implement this features in a way compatible with all compliant hubs.

Regarding JSON-LD, it is also used and required by many new specs of the
web stack (for instance in DID and ActivityPub), and increasingly popular
for HTTP APIs (it's the default format when using the Symfony / API
Platform ecosystem for instance). However, plan add a notice that at least
JSON-LD should be supported (for interoperability), but that a hub can add
support for any other formats it needs using content negotiation. Actually,
I would like to do something similar for the authorization mechanism and
the presence API: it's optional, it can be extended, it can be replaced,
but for interoperability reason a hub SHOULD support **at least** what is
defined in the spec. What do you think?


On Wed, Jul 15, 2020 at 8:22 PM Michael Toomim <toomim@gmail.com> wrote:

> Hi Kévin I'm excited to discover this spec!
>
> A group of us are working on a related draft called Braid, which also adds
> push-updates to HTTP:
>
> Draft:
> https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-braid-http
> Web: https://braid.news
>
> So it seems we are all tackling the same problem, but aren't aware of each
> other! Let's solve that.
> How about we work together to make our systems interoperable, and find
> consensus on a single spec?
>
> To start, I've read the Mercure draft and am leaving a review. I'm also
> cc'ing the braid-http list, in case any Braidly people want to chime in.
> I'd love to see your review of Braid as well.
>
> *# My review of the Mercure draft*
>
> Push-updates is an important, and common problem. A number of IETF specs
> have devised their own solutions: (e.g. Calendar Synchronization
> <https://tools.ietf.org/html/rfc6578>, JMAP Email Synchronization
> <https://tools.ietf.org/html/rfc8620>). Our lives would be much improved
> if we spoke the same push-update (aka "synchronization") language.
>
> I like the general SSE approach. Braid similarly streams updates over a
> long-running GET, although it doesn't use the other parts of SSE.
>
> However, I think the Mercure spec would be simpler, cleaner, and more
> general with two major changes:
>
>    1. Combine the separate *Discovery* and *Subscription* steps into a
>    single request/response.
>    2. Unbundle the *Authorization* method. Let people use their own
>    methods.
>
> These changes would simplify and generalize the spec, and be more
> compatible with Braid. I provide a more detailed analysis below, and then
> conclude with some small suggestions.
>
> *## 1. Combine Discovery and Subscription into a single request/response*
>
> To get a subscription, this spec requires the client to make *two*
>  requests:
>
>    1. A GET to the server to discover how to subscribe
>    2. A separate GET (using SSE) to a "hub" to actually subscribe
>
>
> It is actually possible (and I think simpler) to combine these into a
> single GET request. This is how Braid works. If a GET request includes a
> "Subscribe" header, then the response will return the current version, but
> then also stay open, and stream all new versions as updates like SSE does.
>
> Combining these requests eliminates a round trip of network latency, which
> is great; but it also simplifies the protocol. In particular, there are a
> number of new concepts introduced in this spec (Topic, Hub, Publisher,
> Subscriber) that seem like re-inventions of existing HTTP concepts (Server,
> Resource, Client issuing put, Client issuing get). I presume that you
> didn't re-use the existing HTTP concepts because you needed these to run
> over a separate SSE connection, which doesn't have built-in HTTP semantics.
> However, if you instead stream the updates to any resource within the
> existing HTTP connection, you can re-use all of its existing HTTP
> semantics. Then you won't need to re-specify these concepts in the
> Subscription phase, and the spec will be simpler and easier to digest and
> adopt.
>
> Is it possible to combine these into a single request? Is there any reason
> why these need to be separate in the protocol?
>
> *## 2. Unbundle Authorization*
>
> Another benefit of moving Subscriptions into the existing HTTP requests is
> that we don't need to invent a new authorization method — the subscriptions
> can use whatever the existing HTTP requests use.
>
> However, if I'm missing something, and we do have a need to specify a
> particular style of authorization, can we at least move it into a separate
> spec?  There are many ways to authenticate, and it would be nice if we
> could find consensus on authorization separately from how to push updates.
>
> Ok, that's it for the big items! I'd love to hear what you think about
> these design decisions, and see if we can make these protocols compatible.
>
> *## Details*
>
> - Spec says the hub is a server, and the Link header specifies the hub,
> but the examples provided are Links to full *resources* with a path (e.g.
> https://example.com/.well-known/mercure), not just to a server (e.g.
> https://example.com). So which is it: a hub/server, or a full resource
> with a path?
>
> - Spec says:
>     "The URL of the hub MUST be the "well-known" fixed
> path "/.well-known/mercure"
>   But the point of .well-known <https://tools.ietf.org/html/rfc8615> is
> that you don't need to specify it anywhere, and you don't need the
> discovery step. So I think you can eliminate this header entirely.
>
> - Last-event-id: If I'm not mistaken, this only handles linear time, with
> a single writer. Can this spec support multiple writers modifying a
> resource simultaneously? That would produce two simultaneous
> last-event-ids, neither of which has occurred more recently than the other.
>   - The Braid specification calls this "Parents:".
>
> SSE stream ID:
>    - Why must the ID be an IRI?  Does the IRI reference something?
>    - It'd be more specific to call this a "Version" instead. It's not just
> any ID! It's a Version ID.
>    - Braid calls this "Version", and allows it to be any unique string,
> not just an IRI.
>
> Reconnection
>   - The Mercure spec provides clients with no guarantee that they will
> receive all updates upon reconnection.
>   - Braid servers, on the other hand, can guarantee a duration of time for
> a client to reconnect and receive all updates. I would like to support this
> type of guarantee.
>
> Active Subscriptions
>  - Although I agree that this feature is useful, I'd rather this not be a
> part
>    of the spec.  Do we have any interoperability use-cases where this
> needed?
>    It seems like it would be fine for each web app to have its own method
> of
>    tracking active subscriptions.  This method forces you to use JSON-LD.
>
>
> On Jul 8, 2020, at 7:44 AM, Kévin Dunglas <kevin@dunglas.fr> wrote:
>
> Hi all,
>
> Late 2018, I published an Internet-Draft specifying a protocol called
> Mercure:
>
> Abstract
>
>    Mercure is a protocol enabling the pushing of data updates to web
>    browsers and other HTTP clients in a fast, reliable and battery-
>    efficient way.  It is especially useful for publishing real-time
>    updates of resources served through web APIs to web and mobile apps.
>
> https://datatracker.ietf.org/doc/draft-dunglas-mercure/
>
> I just published the 7th version of the I-D. The protocol can now be
> considered stable and feature complete. It is already widely implemented
> and used, including by popular web frameworks such as Symfony. You can see
> the full list of implementations in the "Implementation Status" section of
> the I-D.
> I would like to go one step further and propose it as a RFC.
>
> I was wondering if the HTTPbis Working Group could host the work on this
> protocol (this looks allowed by the "Other HTTP-Related Work" section of
> the charter)?
>
> Also, I tried to register the link relation (
> https://github.com/protocol-registries/link-relations/issues/21) and the
> "well-known" URI (
> https://github.com/protocol-registries/well-known-uris/issues/4) used by
> the protocol, but it's not possible yet because the I-D isn't on any stream.
> I submitted a new version of the XML file with the following header:
>
> <rfc version="3" ipr="trust200902" docName="draft-dunglas-mercure-07"
> submissionType="IETF" category="std" xml:lang="en" xmlns:xi="
> http://www.w3.org/2001/XInclude" consensus="true">
>
> But it looks like it hasn't been taken into account by the tracker (
> https://datatracker.ietf.org/doc/draft-dunglas-mercure/).
>
> I must admit that the process to propose a RFC is still a bit unclear to
> me. Is this list the right place to propose and discuss this protocol?
> Should I create a new submission for the draft or is it possible to
> "update" the stream on the existing one?
>
> Best regards,
> --
> Kévin Dunglas
>
> https://dunglas.fr <https://dunglas..fr/> / @dunglas
> <https://twitter.com/dunglas>
>
>
>

Received on Thursday, 16 July 2020 10:10:30 UTC