Re: New API surface - inbound outbound streams/tracks

Thank for taking the time to think about this Martin.

On 2012-11-13 23:33, Martin Thomson wrote:
> I've been doing some thinking about this problem and I think that I
> agree with Harald in many respects.  The interaction between the
> different instances becomes unclear.  At least with a composition style
> API, this would be clearer.

I'm not married to the inheritance model at all. My intention was to 
align with the proposal for controlling local devices. 
OutboundVideoTrack exposes API surface related to sending video over the 
network similar to how VideoDeviceTrack (from the settings proposal) 
exposes API surface to control a local video device. I'm open to other 
approaches.

(Thinking out loud here) I recall a previous version (v3) of the 
settings proposal [set-v3] that used an approach where a VideoDevice 
(which exposed the settings API surface) had a reference to the track it 
controlled instead of being a new derived track type. Aligning 
inbound/outbound streams to that would look something like:

*  : composition
+- : inheritance

AbstractMediaStream
|
+- LocalMediaStream
|   * AudioDevice
|      * Track
|   * VideoDevice
|      * Track
|
...
|
+- PeerConnectionMediaStream *new*
|   * MediaStreamTransportList (audioTransports)
|      * OutbundAudioTransport...
|         * Track
|   * MediaStreamTransportList (videoTransports)
|      * OutbundVideoTransport...
|         * Track
|
...

Then we could have single track instances for each media source and how 
a track is consumed depends on the Device or Transport that holds it and 
imposes settings on it.

pc.addStream(localStream);
var outboundStream = pc.localStreams.getStreamById(localStream.id);

localStream.audioDevice.track === outboundStream.audioTransports[0].track

would be true.

[set-v3] 
http://lists.w3.org/Archives/Public/public-media-capture/2012Aug/0143.html

> I have an alternative solution to Harald's underlying problems.  The
> inheritance thing turned out to be superficial only.  I have come to
> believe that the source of confusion is the lack of a distinction
> between the source of a stream and the stream itself.
>
> I think that a clear elucidation of the model could be helpful, to start.
>
> --
>
> Cameras and microphones are instances of media sources.
>   RTCPeerConnection is a different type of media source.
>
> Streams (used here as a synonym for MediaStreamTrack) represent a
> reduction of the current operating mode of the source.  For example, a
> camera might produce a 1080p capture natively that is down-sampled to
> produce a 720p stream.  Constraints select an operating mode, settings
> filter the resulting output to match the requested form.
>
> Streams are inactive unless attached to a sink.  Sinks include <video>
> and <audio> tags; RTCPeerConnection; or recording and sampling.
>
> Sources can produce multiple streams simultaneously.  Simultaneous
> streams require compatible camera modes.  A camera that is capable of
> operating in 16:9 or 4:3 modes might be incapable of producing streams
> in both those aspect ratios simultaneously.
>
> The first stream created for a given source sets the operating mode of
> the source.  Subsequent streams can only be added if the operating mode
> is compatible with the current mode.
>
> The same stream/track can be added to multiple MediaStream instances.
>   The conclusion thus far is that a stream is implicitly cloned by doing
> so.  Because the stream has the same configuration
> (constraints/settings), this is trivially possible.  This allows streams
> to be independently ended or configured (with constraints/settings).

I guess it's an open question if you should be able to apply constraints 
to a track in a cloned MediaStream or if that should be exclusive to the 
"local" track you got from gUM().

> The first problem is that identification of streams is troublesome.  The
> assumption thus far is that the cloned stream shares the same identity
> as its prototype.  This is because the identifier in question is an
> identifier for the *source* and not the stream.  We should fix that.
>
> This implies that MediaStreamTrack::id should actually be
> MediaStreamTrack::sourceId, as it is currently used, though the
> interaction with constraints are unclear.

I've also thought about this in a similar way. It's really a source id.

> A better solution would be to have MediaStreamTrack::source and
> MediaStreamTrack::id.  Where MediaStreamTrack:: source is clearly
> identified and can be used to correlate different streams from the same
> basic source.  MediaStreamTrack::id allows two streams from the same
> source to be distinguished when they have different constraints, which
> might be useful for cases like simulcast.
>
>
>     Source -- (Mode) -- (Settings) ------------- (Sink Limits) -- Sink
>
> A stream is able to communicate information about its consumption by
> sink(s) back to the originating source.  (Real-time streams provide this
> capability using RTCP; in-browser streams can use internal feedback
> channels.)  This allows sources to make choices about operating mode
> that is optimized for actual uses.  If your 1080p camera is only being
> displayed or transmitted at 480p, it might choose to switch to a more
> power-efficient mode as long as this remains true.
>
> Information about how a stream is used can traverse the entire media
> path.  For instance, resizing a video sink down might propagate back so
> that the source is only required to produce the lower resolution.
>   Re-constraining the stream might result in a change to the operating
> mode of the source.  Some sinks require unconstrained access to the
> source: sampling or recording a stream would negate any optimizations
> that might otherwise be possible.
>
>
>    Source (1) -- (0..*) MediaStreamTrack (..) -- (0..*) Sink
>
> This arrangement is less than optimal when it comes to attachment of a
> single stream to multiple sinks.  If the same stream can be attached to
> multiple sinks, the implicit constraints applied by those sinks are not
> made visible in quite the same way.  Any limits applied by a sink must
> first be merged with those from other sinks on the same stream.  More
> importantly, it means that sinks cannot end their attached stream
> without also affecting other users of the same stream.

I don't think that sinks should be allowed to end the streams they're 
consuming. It should be up to the source and API.

> Adam's proposal effectively creates this clone for RTCPeerConnection.

It's not merely a clone. It's an object that describes the association 
between the stream and the PeerConnection it was added to. It uses the 
same media sources though so in that sense it's a clone. But anything 
you do on the "outbound" stream only affects what's sent on the single 
related PeerConnection instance.

>   The stream used by RTCPeerConnection is a clone of the stream that it
> is given.  This addresses the concern for output to RTCPeerConnection,
> but it does not address other uses (<audio> and <video> particularly).

A media element has it's own track lists that describes the media it's 
playing. It offers some control where you, e.g., can set which tracks 
that should be enabled.

http://dev.w3.org/html5/spec/media-elements.html#audiotracklist-and-videotracklist-objects

>   URL.createObjectURL() seems like a candidate for this.  I am coming to
> the conclusion that createObjectURL() is no longer an entirely
> appropriate style of API for this use case; direct assignment is better.
>
>
>    Source (1) -- (0..*) MediaStreamTrack (..) -- (0..1) Sink
>
> Inline images 2
>
> Now, after far too many words, on a largely tangential topic, back on
> task...
>
> I believe that composition APIs for stats and DTMF are more likely to be
> successful than inheritance APIs.  As it stands, going to your
> RTCPeerConnection instance to get stats is ugly, but it is superior to
> what Adam proposes.

I would say that the difference between (1) pc.sendDTMF(targetTrack, 
...) and (2) outbundAudio.sendDTMF(...) is that the association between 
the track and the PeerConnection is kept internal in the PeerConnection 
in (1). The association is then looked up with the targetTrack argument 
when pc.sendDTMF() is called. While in (2), the association is exposed 
in form the outbound track. Implementation-wise they wouldn't have to be 
that different.

The reason why I think it could be beneficial to expose the 
track-PeerConnection association is that it we probably want to do more 
than DTMF and Stats in the future (like e.g., bandwidth and priority).

> What this proposal has over existing APIs is a much-needed measure of
> transparency.   I think that we need to continue to explore options like
> this.  I find the accrual of methods on RTCPeerConnection to be
> problematic, not just from an engineering perspective, but from a
> usability perspective.
>
> For stats, a separate RTCStatisticsRecorder class would be much easier
> to manage, even if it had to be created by RTCPeerConnection.  That
> would be consistent with the chosen direction on DTMF.

As long as we're consistent I could pretty much live with any solution.

/Adam

Received on Wednesday, 14 November 2012 12:02:10 UTC