Re: New API surface - inbound outbound streams/tracks from Harald Alvestrand on 2012-11-14 (public-media-capture@w3.org from November 2012)

From: Harald Alvestrand <harald@alvestrand.no>
Date: Wed, 14 Nov 2012 15:31:28 +0100
To: public-media-capture@w3.org
Message-ID: <50A3AB40.5020203@alvestrand.no>
On 11/14/2012 01:01 PM, Adam Bergkvist wrote:
> Thank for taking the time to think about this Martin.
>
> On 2012-11-13 23:33, Martin Thomson wrote:
>> I've been doing some thinking about this problem and I think that I
>> agree with Harald in many respects.  The interaction between the
>> different instances becomes unclear.  At least with a composition style
>> API, this would be clearer.
>
> I'm not married to the inheritance model at all. My intention was to 
> align with the proposal for controlling local devices. 
> OutboundVideoTrack exposes API surface related to sending video over 
> the network similar to how VideoDeviceTrack (from the settings 
> proposal) exposes API surface to control a local video device. I'm 
> open to other approaches.
>
> (Thinking out loud here) I recall a previous version (v3) of the 
> settings proposal [set-v3] that used an approach where a VideoDevice 
> (which exposed the settings API surface) had a reference to the track 
> it controlled instead of being a new derived track type. Aligning 
> inbound/outbound streams to that would look something like:
>
> *  : composition
> +- : inheritance
hm. I think I'm not parsing.
I use the term "composition" in terms of "bringing multiple dissimilar 
things together inside an object".
Since you're using * below between track and stream, it looks as if 
you're also using it for containment - bringing multiple things of the 
same type together inside an object.
>
> AbstractMediaStream
> |
> +- LocalMediaStream
> |   * AudioDevice
> |      * Track
> |   * VideoDevice
> |      * Track
This doesn't compute, since devices can be on multiple tracks.
> |
> ...
> |
> +- PeerConnectionMediaStream *new*
> |   * MediaStreamTransportList (audioTransports)
> |      * OutbundAudioTransport...
> |         * Track
> |   * MediaStreamTransportList (videoTransports)
> |      * OutbundVideoTransport...
> |         * Track

This doesn't compute, since transports are not hierarchical with 
mediastreams.

Alternate, to take the other extreme:

MediaStream contains (composition)
    TrackContainer (containment) (one or two)
        MediaStreamTrack contains (composition)
           DeviceReference? (present when linked to a device source)
           OutgoingTransportReference? (present when linked to a PC's 
outgoing stream list)
           IncomingTransportReference? (present when linked to a PC's 
incoming stream list)


Inheritance hierarchy:

Device
   + AudioDevice
   + VideoDevice
      + PictureDevice

Perhaps we don't need so much inheritance....

> |
> ...
>
> Then we could have single track instances for each media source and 
> how a track is consumed depends on the Device or Transport that holds 
> it and imposes settings on it.
>
> pc.addStream(localStream);
> var outboundStream = pc.localStreams.getStreamById(localStream.id);
>
> localStream.audioDevice.track === outboundStream.audioTransports[0].track
>
> would be true.
>
> [set-v3] 
> http://lists.w3.org/Archives/Public/public-media-capture/2012Aug/0143.html
>
>> I have an alternative solution to Harald's underlying problems.  The
>> inheritance thing turned out to be superficial only.  I have come to
>> believe that the source of confusion is the lack of a distinction
>> between the source of a stream and the stream itself.
>>
>> I think that a clear elucidation of the model could be helpful, to 
>> start.
>>
>> -- 
>>
>> Cameras and microphones are instances of media sources.
>>   RTCPeerConnection is a different type of media source.
>>
>> Streams (used here as a synonym for MediaStreamTrack) represent a
>> reduction of the current operating mode of the source.  For example, a
>> camera might produce a 1080p capture natively that is down-sampled to
>> produce a 720p stream.  Constraints select an operating mode, settings
>> filter the resulting output to match the requested form.
>>
>> Streams are inactive unless attached to a sink.  Sinks include <video>
>> and <audio> tags; RTCPeerConnection; or recording and sampling.
>>
>> Sources can produce multiple streams simultaneously. Simultaneous
>> streams require compatible camera modes.  A camera that is capable of
>> operating in 16:9 or 4:3 modes might be incapable of producing streams
>> in both those aspect ratios simultaneously.
>>
>> The first stream created for a given source sets the operating mode of
>> the source.  Subsequent streams can only be added if the operating mode
>> is compatible with the current mode.
>>
>> The same stream/track can be added to multiple MediaStream instances.
>>   The conclusion thus far is that a stream is implicitly cloned by doing
>> so.  Because the stream has the same configuration
>> (constraints/settings), this is trivially possible.  This allows streams
>> to be independently ended or configured (with constraints/settings).
>
> I guess it's an open question if you should be able to apply 
> constraints to a track in a cloned MediaStream or if that should be 
> exclusive to the "local" track you got from gUM().
>
>> The first problem is that identification of streams is troublesome.  The
>> assumption thus far is that the cloned stream shares the same identity
>> as its prototype.  This is because the identifier in question is an
>> identifier for the *source* and not the stream.  We should fix that.
>>
>> This implies that MediaStreamTrack::id should actually be
>> MediaStreamTrack::sourceId, as it is currently used, though the
>> interaction with constraints are unclear.
>
> I've also thought about this in a similar way. It's really a source id.
>
>> A better solution would be to have MediaStreamTrack::source and
>> MediaStreamTrack::id.  Where MediaStreamTrack:: source is clearly
>> identified and can be used to correlate different streams from the same
>> basic source.  MediaStreamTrack::id allows two streams from the same
>> source to be distinguished when they have different constraints, which
>> might be useful for cases like simulcast.
>>
>>
>>     Source -- (Mode) -- (Settings) ------------- (Sink Limits) -- Sink
>>
>> A stream is able to communicate information about its consumption by
>> sink(s) back to the originating source.  (Real-time streams provide this
>> capability using RTCP; in-browser streams can use internal feedback
>> channels.)  This allows sources to make choices about operating mode
>> that is optimized for actual uses.  If your 1080p camera is only being
>> displayed or transmitted at 480p, it might choose to switch to a more
>> power-efficient mode as long as this remains true.
>>
>> Information about how a stream is used can traverse the entire media
>> path.  For instance, resizing a video sink down might propagate back so
>> that the source is only required to produce the lower resolution.
>>   Re-constraining the stream might result in a change to the operating
>> mode of the source.  Some sinks require unconstrained access to the
>> source: sampling or recording a stream would negate any optimizations
>> that might otherwise be possible.
>>
>>
>>    Source (1) -- (0..*) MediaStreamTrack (..) -- (0..*) Sink
>>
>> This arrangement is less than optimal when it comes to attachment of a
>> single stream to multiple sinks.  If the same stream can be attached to
>> multiple sinks, the implicit constraints applied by those sinks are not
>> made visible in quite the same way.  Any limits applied by a sink must
>> first be merged with those from other sinks on the same stream. More
>> importantly, it means that sinks cannot end their attached stream
>> without also affecting other users of the same stream.
>
> I don't think that sinks should be allowed to end the streams they're 
> consuming. It should be up to the source and API.
>
>> Adam's proposal effectively creates this clone for RTCPeerConnection.
>
> It's not merely a clone. It's an object that describes the association 
> between the stream and the PeerConnection it was added to. It uses the 
> same media sources though so in that sense it's a clone. But anything 
> you do on the "outbound" stream only affects what's sent on the single 
> related PeerConnection instance.
>
>>   The stream used by RTCPeerConnection is a clone of the stream that it
>> is given.  This addresses the concern for output to RTCPeerConnection,
>> but it does not address other uses (<audio> and <video> particularly).
>
> A media element has it's own track lists that describes the media it's 
> playing. It offers some control where you, e.g., can set which tracks 
> that should be enabled.
>
> http://dev.w3.org/html5/spec/media-elements.html#audiotracklist-and-videotracklist-objects 
>
>
>>   URL.createObjectURL() seems like a candidate for this.  I am coming to
>> the conclusion that createObjectURL() is no longer an entirely
>> appropriate style of API for this use case; direct assignment is better.
>>
>>
>>    Source (1) -- (0..*) MediaStreamTrack (..) -- (0..1) Sink
>>
>> Inline images 2
>>
>> Now, after far too many words, on a largely tangential topic, back on
>> task...
>>
>> I believe that composition APIs for stats and DTMF are more likely to be
>> successful than inheritance APIs.  As it stands, going to your
>> RTCPeerConnection instance to get stats is ugly, but it is superior to
>> what Adam proposes.
>
> I would say that the difference between (1) pc.sendDTMF(targetTrack, 
> ...) and (2) outbundAudio.sendDTMF(...) is that the association 
> between the track and the PeerConnection is kept internal in the 
> PeerConnection in (1). The association is then looked up with the 
> targetTrack argument when pc.sendDTMF() is called. While in (2), the 
> association is exposed in form the outbound track. Implementation-wise 
> they wouldn't have to be that different.
>
> The reason why I think it could be beneficial to expose the 
> track-PeerConnection association is that it we probably want to do 
> more than DTMF and Stats in the future (like e.g., bandwidth and 
> priority).
>
>> What this proposal has over existing APIs is a much-needed measure of
>> transparency.   I think that we need to continue to explore options like
>> this.  I find the accrual of methods on RTCPeerConnection to be
>> problematic, not just from an engineering perspective, but from a
>> usability perspective.
>>
>> For stats, a separate RTCStatisticsRecorder class would be much easier
>> to manage, even if it had to be created by RTCPeerConnection. That
>> would be consistent with the chosen direction on DTMF.
>
> As long as we're consistent I could pretty much live with any solution.
>
> /Adam
>
Received on Wednesday, 14 November 2012 14:32:03 UTC