RE: New API surface - inbound outbound streams/tracks


When you say “I am coming to the conclusion that createObjectURL() is no longer an entirely appropriate style of API for this use case; direct assignment is better”,  are you intending that the direct assignment would clone the Track?  However, the object passed to <audio> and<video> is a MediaStream, so we’d have to clone it as well as its tracks.  And if we don’t clone them, I don’t see how this helps with the problem that you have identified.  


However, as a matter of style, I think it’s fishy/confusing if a bunch of our functions implicitly clone Tracks, while not cloning other objects.   My model for this sort of problem is Lisp, where some functions operated on copies of their args, while others destructively modified them.  But at least the Lisp naming conventions assigned different names to these two types of functions, so programmers would eventually figure it out.  


It might be clearer if we always passed the _Source_ as the argument, and then documented that the function created a new Track for the Source.  On the other hand, it wouldn’t be clear what the settings for the new Track should be.    


In short, I think that you have identified the a real problem, but I’m not sure that we have a clean solution yet.


-          Jim


From: Martin Thomson [] 
Sent: Tuesday, November 13, 2012 5:34 PM
To: Adam Bergkvist
Subject: Re: New API surface - inbound outbound streams/tracks


I've been doing some thinking about this problem and I think that I agree with Harald in many respects.  The interaction between the different instances becomes unclear.  At least with a composition style API, this would be clearer.

I have an alternative solution to Harald's underlying problems.  The inheritance thing turned out to be superficial only.  I have come to believe that the source of confusion is the lack of a distinction between the source of a stream and the stream itself.

I think that a clear elucidation of the model could be helpful, to start.


Cameras and microphones are instances of media sources.  RTCPeerConnection is a different type of media source.

Streams (used here as a synonym for MediaStreamTrack) represent a reduction of the current operating mode of the source.  For example, a camera might produce a 1080p capture natively that is down-sampled to produce a 720p stream.  Constraints select an operating mode, settings filter the resulting output to match the requested form.

Streams are inactive unless attached to a sink.  Sinks include <video> and <audio> tags; RTCPeerConnection; or recording and sampling.

Sources can produce multiple streams simultaneously.  Simultaneous streams require compatible camera modes.  A camera that is capable of operating in 16:9 or 4:3 modes might be incapable of producing streams in both those aspect ratios simultaneously.

The first stream created for a given source sets the operating mode of the source.  Subsequent streams can only be added if the operating mode is compatible with the current mode.

The same stream/track can be added to multiple MediaStream instances.  The conclusion thus far is that a stream is implicitly cloned by doing so.  Because the stream has the same configuration (constraints/settings), this is trivially possible.  This allows streams to be independently ended or configured (with constraints/settings).

The first problem is that identification of streams is troublesome.  The assumption thus far is that the cloned stream shares the same identity as its prototype.  This is because the identifier in question is an identifier for the *source* and not the stream.  We should fix that.

This implies that MediaStreamTrack::id should actually be MediaStreamTrack::sourceId, as it is currently used, though the interaction with constraints are unclear.

A better solution would be to have MediaStreamTrack::source and MediaStreamTrack::id.  Where MediaStreamTrack:: source is clearly identified and can be used to correlate different streams from the same basic source.  MediaStreamTrack::id allows two streams from the same source to be distinguished when they have different constraints, which might be useful for cases like simulcast.

   Source -- (Mode) -- (Settings) ------------- (Sink Limits) -- Sink

A stream is able to communicate information about its consumption by sink(s) back to the originating source.  (Real-time streams provide this capability using RTCP; in-browser streams can use internal feedback channels.)  This allows sources to make choices about operating mode that is optimized for actual uses.  If your 1080p camera is only being displayed or transmitted at 480p, it might choose to switch to a more power-efficient mode as long as this remains true.

Information about how a stream is used can traverse the entire media path.  For instance, resizing a video sink down might propagate back so that the source is only required to produce the lower resolution.  Re-constraining the stream might result in a change to the operating mode of the source.  Some sinks require unconstrained access to the source: sampling or recording a stream would negate any optimizations that might otherwise be possible.

  Source (1) -- (0..*) MediaStreamTrack (..) -- (0..*) Sink

This arrangement is less than optimal when it comes to attachment of a single stream to multiple sinks.  If the same stream can be attached to multiple sinks, the implicit constraints applied by those sinks are not made visible in quite the same way.  Any limits applied by a sink must first be merged with those from other sinks on the same stream.  More importantly, it means that sinks cannot end their attached stream without also affecting other users of the same stream.

Adam's proposal effectively creates this clone for RTCPeerConnection.  The stream used by RTCPeerConnection is a clone of the stream that it is given.  This addresses the concern for output to RTCPeerConnection, but it does not address other uses (<audio> and <video> particularly).  URL.createObjectURL() seems like a candidate for this.  I am coming to the conclusion that createObjectURL() is no longer an entirely appropriate style of API for this use case; direct assignment is better.

  Source (1) -- (0..*) MediaStreamTrack (..) -- (0..1) Sink


Now, after far too many words, on a largely tangential topic, back on task...

I believe that composition APIs for stats and DTMF are more likely to be successful than inheritance APIs.  As it stands, going to your RTCPeerConnection instance to get stats is ugly, but it is superior to what Adam proposes.  

What this proposal has over existing APIs is a much-needed measure of transparency.   I think that we need to continue to explore options like this.  I find the accrual of methods on RTCPeerConnection to be problematic, not just from an engineering perspective, but from a usability perspective.

For stats, a separate RTCStatisticsRecorder class would be much easier to manage, even if it had to be created by RTCPeerConnection.  That would be consistent with the chosen direction on DTMF.



On 9 November 2012 04:14, Adam Bergkvist <> wrote:


A while back I sent out a proposal [1] on API additions to represent streams that are sent and received via a PeerConnection. The main goal was to have a natural API surface for the new functionality we're defining (e.g. Stats and DTMF). I didn't get any feedback on the list, but I did get some offline.

I've updated the proposal to match v4 of Travis' settings proposal [2] and would like to run it via the list again.

Summary of the main design goals:
- Have a way to represent a stream instance (witch tracks) that are sent (or received) over a specific PeerConnection. Specifically, if the same stream is sent via several PeerConnection objects, the sent stream is represented by different "outbound streams" to provide fine grained control over the different transmissions.

- Avoid cluttering PeerConnection with a lot of new API that really belongs on stream (and track) level but isn't applicable for the local only case. The representations of sent and received streams and tracks (inbound and outbound) provides the more precise API surface that we need for several of the APIs we're specifying right now as well as future APIs of the same kind.

Here are the object structure (new objects are marked with *new*). Find examples below.

AbstractMediaStream *new*
+- MediaStream
|   * WritableMediaStreamTrackList (audioTracks)
|   * WritableMediaStreamTrackList (videoTracks)
+- PeerConnectionMediaStream *new*
    // represents inbound and outbound streams (we could use
    // separate types if more flexibility is required)
    * MediaStreamTrackList (audioTracks)
    * MediaStreamTrackList (videoTracks)

+- VideoStreamTrack
|  |
|  +- VideoDeviceTrack
|  |   * PictureDevice
|  |
|  +- InboundVideoTrack *new*
|  |   // inbound video stats
|  |
|  +- OutboundVideoTrack *new*
|      // control outgoing bandwidth, priority, ...
|      // outbound video stats
|      // enable/disable outgoing (?)
+- AudioStreamTrack
   +- AudioDeviceTrack
   +- InboundAudioStreamTrack *new*
   |   // receive DTMF (?)
   |   // inbound audio stats
   +- OutboundAudioStreamTrack *new*
       // send DTMF
       // control outgoing bandwidth, priority, ...
       // outbound audio stats
       // enable/disable outgoing (?)

=== Examples ===

// 1. ***** Send DTMF *****

// ...

var outboundStream = pc.localStreams.getStreamById(;
var outboundAudio = outboundStream.audioTracks[0]; // pending syntax

if (outboundAudio.canSendDTMF)
    outboundAudio.insertTones("123", ...);

// 2. ***** Control outgoing media with constraints *****

// the way of setting constraints in this example is based on Travis'
// proposal (v4) combined with some points from Randell's bug 18561 [3]

var speakerStream; // speaker audio and video
var slidesStream; // video of slides

// ...

var outboundSpeakerStream = pc.localStreams
var speakerAudio = outboundSpeakerStream.audioTracks[0];
var speakerVideo = outboundSpeakerStream.videoTracks[0];

speakerAudio.bitrate.request({ "min": 30, "max": 120,
                               "thresholdToNotify": 10 });
speakerAudio.bitrate.onchange = speakerAudioBitrateChanged;
speakerAudio.onconstraintserror = failureToComply;

speakerVideo.bitrate.request({ "min": 500, "max": 1000,
                               "thresholdToNotify": 100 });
speakerAudio.bitrate.onchange = speakerVideoBitrateChanged;
speakerVideo.onconstraintserror = failureToComply;

var outboundSlidesStream = pc.localStreams
var slidesVideo = outboundSlidesStream.videoTracks[0];

slidesVideo.bitrate.request({ "min": 600, "max": 800,
                              "thresholdToNotify": 50 });
slidesVideo.bitrate.onchange = slidesVideoBitrateChanged;
slidesVideo.onconstraintserror = failureToComply;

// 3. ***** Enable/disable on outbound tracks *****

// send same stream to two different peers
// ...

// retrieve the *different* outbound streams
var streamToA = pcA.localStreams.getStreamById(;
var streamToB = pcB.localStreams.getStreamById(;

// disable video to A and disable audio to B
streamToA.videoTracks[0].enabled = false;
streamToA.audioTracks[0].enabled = false;


Please comment and don't hesitate to ask if things are unclear.






Received on Tuesday, 13 November 2012 22:57:12 UTC