Re: New API surface - inbound outbound streams/tracks

On 2012-11-14 15:31, Harald Alvestrand wrote:
> On 11/14/2012 01:01 PM, Adam Bergkvist wrote:
>> Thank for taking the time to think about this Martin.
>>
>> On 2012-11-13 23:33, Martin Thomson wrote:
>>> I've been doing some thinking about this problem and I think that I
>>> agree with Harald in many respects.  The interaction between the
>>> different instances becomes unclear.  At least with a composition style
>>> API, this would be clearer.
>>
>> I'm not married to the inheritance model at all. My intention was to
>> align with the proposal for controlling local devices.
>> OutboundVideoTrack exposes API surface related to sending video over
>> the network similar to how VideoDeviceTrack (from the settings
>> proposal) exposes API surface to control a local video device. I'm
>> open to other approaches.
>>
>> (Thinking out loud here) I recall a previous version (v3) of the
>> settings proposal [set-v3] that used an approach where a VideoDevice
>> (which exposed the settings API surface) had a reference to the track
>> it controlled instead of being a new derived track type. Aligning
>> inbound/outbound streams to that would look something like:
>>
>> *  : composition
>> +- : inheritance
> hm. I think I'm not parsing.
> I use the term "composition" in terms of "bringing multiple dissimilar
> things together inside an object".
> Since you're using * below between track and stream, it looks as if
> you're also using it for containment - bringing multiple things of the
> same type together inside an object.

I'm using composition as a stronger "has a" relationship. For example, 
(from below) a LocalMediaStream has an AudioDevice.

http://en.wikipedia.org/wiki/Class_diagram#Composition

>>
>> AbstractMediaStream
>> |
>> +- LocalMediaStream
>> |   * AudioDevice
>> |      * Track
>> |   * VideoDevice
>> |      * Track
> This doesn't compute, since devices can be on multiple tracks.

This comes unmodified from [set-v3]. I think idea was to have a device 
tied to a track in the LocalMediaStream. The track could then be used to 
create regular MediaStreams and would then exist without the device 
(that accompanied it in the LocalMediaStream).

>> |
>> ...
>> |
>> +- PeerConnectionMediaStream *new*
>> |   * MediaStreamTransportList (audioTransports)
>> |      * OutbundAudioTransport...
>> |         * Track
>> |   * MediaStreamTransportList (videoTransports)
>> |      * OutbundVideoTransport...
>> |         * Track
>
> This doesn't compute, since transports are not hierarchical with
> mediastreams.

That's not the intention either. The PeerConnectionMediaStream has a 
list of audio and video transports. Each transport has a corresponding 
track (similar to the track-device relationship above).

> Alternate, to take the other extreme:
>
> MediaStream contains (composition)
>     TrackContainer (containment) (one or two)
>         MediaStreamTrack contains (composition)
>            DeviceReference? (present when linked to a device source)
>            OutgoingTransportReference? (present when linked to a PC's
> outgoing stream list)
>            IncomingTransportReference? (present when linked to a PC's
> incoming stream list)
>

I interpret this as a regular MediaStream that's not created in 
association with a PeerConnection. One issue is that it can only be sent 
with one PeerConnection unless it has a list of 
OutgoingTransportReferences. It also introduces the inbound/outbound 
stuff to the local only use case.

>
> Inheritance hierarchy:
>
> Device
>    + AudioDevice
>    + VideoDevice
>       + PictureDevice
>
> Perhaps we don't need so much inheritance....
>
>> |
>> ...
>>
>> Then we could have single track instances for each media source and
>> how a track is consumed depends on the Device or Transport that holds
>> it and imposes settings on it.
>>
>> pc.addStream(localStream);
>> var outboundStream = pc.localStreams.getStreamById(localStream.id);
>>
>> localStream.audioDevice.track === outboundStream.audioTransports[0].track
>>
>> would be true.
>>
>> [set-v3]
>> http://lists.w3.org/Archives/Public/public-media-capture/2012Aug/0143.html
>>
>>
>>> I have an alternative solution to Harald's underlying problems.  The
>>> inheritance thing turned out to be superficial only.  I have come to
>>> believe that the source of confusion is the lack of a distinction
>>> between the source of a stream and the stream itself.
>>>
>>> I think that a clear elucidation of the model could be helpful, to
>>> start.
>>>
>>> --
>>>
>>> Cameras and microphones are instances of media sources.
>>>   RTCPeerConnection is a different type of media source.
>>>
>>> Streams (used here as a synonym for MediaStreamTrack) represent a
>>> reduction of the current operating mode of the source.  For example, a
>>> camera might produce a 1080p capture natively that is down-sampled to
>>> produce a 720p stream.  Constraints select an operating mode, settings
>>> filter the resulting output to match the requested form.
>>>
>>> Streams are inactive unless attached to a sink.  Sinks include <video>
>>> and <audio> tags; RTCPeerConnection; or recording and sampling.
>>>
>>> Sources can produce multiple streams simultaneously. Simultaneous
>>> streams require compatible camera modes.  A camera that is capable of
>>> operating in 16:9 or 4:3 modes might be incapable of producing streams
>>> in both those aspect ratios simultaneously.
>>>
>>> The first stream created for a given source sets the operating mode of
>>> the source.  Subsequent streams can only be added if the operating mode
>>> is compatible with the current mode.
>>>
>>> The same stream/track can be added to multiple MediaStream instances.
>>>   The conclusion thus far is that a stream is implicitly cloned by doing
>>> so.  Because the stream has the same configuration
>>> (constraints/settings), this is trivially possible.  This allows streams
>>> to be independently ended or configured (with constraints/settings).
>>
>> I guess it's an open question if you should be able to apply
>> constraints to a track in a cloned MediaStream or if that should be
>> exclusive to the "local" track you got from gUM().
>>
>>> The first problem is that identification of streams is troublesome.  The
>>> assumption thus far is that the cloned stream shares the same identity
>>> as its prototype.  This is because the identifier in question is an
>>> identifier for the *source* and not the stream.  We should fix that.
>>>
>>> This implies that MediaStreamTrack::id should actually be
>>> MediaStreamTrack::sourceId, as it is currently used, though the
>>> interaction with constraints are unclear.
>>
>> I've also thought about this in a similar way. It's really a source id.
>>
>>> A better solution would be to have MediaStreamTrack::source and
>>> MediaStreamTrack::id.  Where MediaStreamTrack:: source is clearly
>>> identified and can be used to correlate different streams from the same
>>> basic source.  MediaStreamTrack::id allows two streams from the same
>>> source to be distinguished when they have different constraints, which
>>> might be useful for cases like simulcast.
>>>
>>>
>>>     Source -- (Mode) -- (Settings) ------------- (Sink Limits) -- Sink
>>>
>>> A stream is able to communicate information about its consumption by
>>> sink(s) back to the originating source.  (Real-time streams provide this
>>> capability using RTCP; in-browser streams can use internal feedback
>>> channels.)  This allows sources to make choices about operating mode
>>> that is optimized for actual uses.  If your 1080p camera is only being
>>> displayed or transmitted at 480p, it might choose to switch to a more
>>> power-efficient mode as long as this remains true.
>>>
>>> Information about how a stream is used can traverse the entire media
>>> path.  For instance, resizing a video sink down might propagate back so
>>> that the source is only required to produce the lower resolution.
>>>   Re-constraining the stream might result in a change to the operating
>>> mode of the source.  Some sinks require unconstrained access to the
>>> source: sampling or recording a stream would negate any optimizations
>>> that might otherwise be possible.
>>>
>>>
>>>    Source (1) -- (0..*) MediaStreamTrack (..) -- (0..*) Sink
>>>
>>> This arrangement is less than optimal when it comes to attachment of a
>>> single stream to multiple sinks.  If the same stream can be attached to
>>> multiple sinks, the implicit constraints applied by those sinks are not
>>> made visible in quite the same way.  Any limits applied by a sink must
>>> first be merged with those from other sinks on the same stream. More
>>> importantly, it means that sinks cannot end their attached stream
>>> without also affecting other users of the same stream.
>>
>> I don't think that sinks should be allowed to end the streams they're
>> consuming. It should be up to the source and API.
>>
>>> Adam's proposal effectively creates this clone for RTCPeerConnection.
>>
>> It's not merely a clone. It's an object that describes the association
>> between the stream and the PeerConnection it was added to. It uses the
>> same media sources though so in that sense it's a clone. But anything
>> you do on the "outbound" stream only affects what's sent on the single
>> related PeerConnection instance.
>>
>>>   The stream used by RTCPeerConnection is a clone of the stream that it
>>> is given.  This addresses the concern for output to RTCPeerConnection,
>>> but it does not address other uses (<audio> and <video> particularly).
>>
>> A media element has it's own track lists that describes the media it's
>> playing. It offers some control where you, e.g., can set which tracks
>> that should be enabled.
>>
>> http://dev.w3.org/html5/spec/media-elements.html#audiotracklist-and-videotracklist-objects
>>
>>
>>>   URL.createObjectURL() seems like a candidate for this.  I am coming to
>>> the conclusion that createObjectURL() is no longer an entirely
>>> appropriate style of API for this use case; direct assignment is better.
>>>
>>>
>>>    Source (1) -- (0..*) MediaStreamTrack (..) -- (0..1) Sink
>>>
>>> Inline images 2
>>>
>>> Now, after far too many words, on a largely tangential topic, back on
>>> task...
>>>
>>> I believe that composition APIs for stats and DTMF are more likely to be
>>> successful than inheritance APIs.  As it stands, going to your
>>> RTCPeerConnection instance to get stats is ugly, but it is superior to
>>> what Adam proposes.
>>
>> I would say that the difference between (1) pc.sendDTMF(targetTrack,
>> ...) and (2) outbundAudio.sendDTMF(...) is that the association
>> between the track and the PeerConnection is kept internal in the
>> PeerConnection in (1). The association is then looked up with the
>> targetTrack argument when pc.sendDTMF() is called. While in (2), the
>> association is exposed in form the outbound track. Implementation-wise
>> they wouldn't have to be that different.
>>
>> The reason why I think it could be beneficial to expose the
>> track-PeerConnection association is that it we probably want to do
>> more than DTMF and Stats in the future (like e.g., bandwidth and
>> priority).
>>
>>> What this proposal has over existing APIs is a much-needed measure of
>>> transparency.   I think that we need to continue to explore options like
>>> this.  I find the accrual of methods on RTCPeerConnection to be
>>> problematic, not just from an engineering perspective, but from a
>>> usability perspective.
>>>
>>> For stats, a separate RTCStatisticsRecorder class would be much easier
>>> to manage, even if it had to be created by RTCPeerConnection. That
>>> would be consistent with the chosen direction on DTMF.
>>
>> As long as we're consistent I could pretty much live with any solution.
>>
>> /Adam
>>
>
>

Received on Wednesday, 14 November 2012 15:48:47 UTC