Flows vs Sessions (was Re: Mozilla/Cisco API Proposal)

Rather than trying to respond in detail with an alternative API, I thought I'd up-level a little and give some feedback from the experience gained when developing RTMFP, and its predecessor, MFP (which is still floating around as GPL-licensed source, if you can find it).

The aforementioned protocols were both designed from scratch to securely transport multiple streams of audio, video, and data between a pair of endpoints, in a congestion-controlled and prioritized fashion, with NAT traversal (and some other capabilities, though those are less relevant).

When we originally developed MFP, we had a "session-based" API. (We'll use Alice and Bob for the examples here, as I think everyone's familiar with that.)

"Session-based" Example:

Alice creates an MFP instance (with an associated cryptographic identity) at an address and (UDP) port. Alice then listens for incoming sessions. Bob creates an MFP instance (also with an associated cryptographic identity) at an address and port. Bob then opens a new Session to Alice (by specifying the cryptographic identity, address, and port). Alice is notified of the opening session, and if Alice accepts the Session is up. Once the Session is up, either party may create new (unidirectional) Flows by passing a handle to the Session to an API that opens flows and sending data on them. At the far end, the API provides a notification of a newly-arriving Flow which may be accepted or rejected.


What we discovered when trying to implement on top of this is that in order to create a prioritized congestion-managed Session between two endpoints that actually works, you need to ensure that there is only ONE such session. And providing a Session-based API is actually highly inconvenient when there are glare conditions, which in some scenarios (like peer-to-peer presence notification flows that are opened when a new peer comes online) are extremely common. Using the Session as a handle when creating Flows makes it very hard to deal with the case where you want to kill one Session in favor of another when resolving glare conditions.

Having a single session is also advantageous with regard to the number of UDP ports used, reuse of the NAT traversal work, and reuse of the security mechanism's keying and authentication handshake.

Our solution was to change to was a "Flow-based" API. (And again using Alice and Bob...)

"Flow-based example"

Alice creates and MFP instance (with an associated cryptographic identity) at an address and UDP port. Alice then listens for incoming FLOWS. Bob creates an MFP instance (also with an associated cryptographic identity) at an address and port. Bob then opens a new Flow to Alice by specifying the cryptographic identity, address, port, and some flow metadata (like a "port number" but more general). Bob's implementation checks to see if there is a Session already open or in the process of opening to Alice. If so, Bob binds the flow to that Session. If at any time during the opening process, Alice is opening a Session to Bob there is a tie-breaker that allows only one of the Sessions to live. If there were Flows bound to a session that was in the process of opening that is killed because the other session wins, these Flows are internally re-bound to the session that survives. In this way the user of the API is ensured that only a single Session is created between a pair of endpoints, but does not need to care how that works... they simply open Flows, and Flows either use an existing session or create a new one (automatically dealing with the glare case) as needed.

In the case of RTMFP in Flash Player, even these APIs are hidden under an API that allows creation of NetStreams that are "publishing locally" and NetStreams that are "playing from a remote peer"... but if you create a publisher at each end and a player at each end, you'll find that there is only one Session between the two... shared congestion state, shared NAT traversal, shared key negotiation and authentication. Flash Player also hides the address and port from the Flow opening process... you provide ONLY the cryptographic identity (called the "peer ID" inside Flash Player) and the Flow is opened to <peer ID>, <rendezvous address:port>. The rendezvous service then redirects the opening connection to attempt additional candidate addresses *and* forwards the first part of the handshake to the far end via a side channel in order to effect NAT/firewall hole-punching.

(There are additional details about what happens under the hood available in my presentation to TSVAREA at IETF 77 in Anaheim.)

Hopefully this background will provide some food for thought with regard to the API design... if anyone would like to collaborate directly on how this might be mapped to an alternative API proposal for webrtc, please get in touch.

Matthew Kaufman
matthew.kaufman@skype.net

Received on Saturday, 16 July 2011 20:15:28 UTC