- From: Timothy B. Terriberry <tterriberry@mozilla.com>
- Date: Wed, 05 Oct 2011 11:59:43 -0700
- CC: "public-webrtc@w3.org" <public-webrtc@w3.org>
Adam Bergkvist wrote: > empty (no tracks), and tracks would have to be added later. I think it > would simplify things (e.g. MediaStream playback and sending with > PeerConnection) if a MediaStream is immutable with regards to its track > list. I'm not sure this is really a problem: the request indicates whether you asked for audio and/or video, and tracks can be pre-created that simply don't reach the appropriate ready state until the user gives consent (if ever). You can still argue about whether you want the user to be able to consent to "just audio" or "just video" when you asked for both, and what should be done in that case. I'll let Anant tackle that issue. The issue of tracklist mutability, however, is one I've brought up before, and was discussed a little bit on the W3C call today, without reaching any conclusions. Let me try to summarize things so we can move towards a resolution. In attempting to define exactly how a MediaStream and a MediaStreamTrack relate to the underlying RTP concepts, it has been proposed that each MediaStreamTrack corresponds to a single SSRC. The SSRC namespace only guarantees uniqueness within an RTP session, but for the sake of argument I'm going to assume any use of the same SSRC in different sessions is intentional, for things like FEC or layered codecs, which I expect would still map to a single track. It has also been proposed that all the MediaStreamTracks correspond to the same CNAME, but not necessarily that all MediaStreamTracks with the same CNAME belong to the same MediaStream. For the purposes of this discussion, when I say "synchronization", I mean the actual presentation of timestamped audio and video at the proper times. I am assuming that things like clock drift, time stretching, and shrinking (i.e., the jitter buffer part) is handled internally by the browser, which can see the CNAME for all tracks. So, I'll rephrase my original question (from 9/22), which I don't think was ever answered, in slightly more concrete terms: What happens when a remote participant, currently sending only audio, adds a video track with the same CNAME? I see a few possibilities: 1) Add it as a new MediaStreamTrack to the existing MediaStream containing the audio. We don't have any API for callbacks to indicate this has happened. As Adam pointed out on the call, this also complicates things if, for example, that MediaStream is being fed into another PeerConnection (where the far end may not support that media type), or even another local consumer (e.g., MediaStream.record(): the container in use may not allow a stream of a new type to be added partway through the recording). 2) Add it as a new MediaStream containing just the new MediaStreamTrack corresponding to the video. This leaves the receiving side with two MediaStream objects containing different tracks with the same CNAME, which must be synchronized manually, e.g., by feeding them, as blocking inputs, into a single ProcessedMediaStream from roc's MediaStream Processing API, or, depending on how you want to define the semantics, possibly just creating a new MediaStream containing both tracks. We haven't really talked about how mixing tracks from different MediaStreams affects synchronization, but I strongly recommend looking at the MediaStream Processing API, and its attempts to prevent the same media source from playing out at two different rates. In either case, this means that your local processing graph is now different depending on how you set up the call. There is also currently no API that indicates that these two MediaStreams share the same CNAME, so you don't have any way of knowing you need to do this. 3) Remove the old MediaStream and add a new MediaStream containing both the old MediaStreamTrack corresponding to the audio and the new MediaStreamTrack corresponding to the video. When you do this, you can either a) literally use the same MediaStreamTrack object used in the old MediaStream, or b) create a new MediaStreamTrack object for the old audio track in the new MediaStream. I think 3a would mean, for example, that if you ignored the callback and continued to use the old MediaStream object, then the audio would continue playing through it. That leaves you with an object that _acts_ is if it was one of the currently active remote streams, but is not actually in the PeerConnection's list of remote streams. It also means it may not be synchronized with the new track, unless you do something to enforce that synchronization (e.g., switch to using the new MediaStream object). 3b, on the other hand, leaves you with the problem of synchronizing the transition from the old track to the new track. Unless you can respond to the callback and reshuffle your media graph _immediately_ (the next stable state may be too late), you may introduce gaps after the media stops flowing from the old track and starts flowing from the new track. Unless you (and the browser implementation) are very careful, you may also lose any internal buffered state (e.g., packets that were received and decoded, but only partially played out). Keep in mind that it's sometimes necessary for the browser to rewind and re-process an internal buffer (e.g., to reduce the latency of volume changes taking effect, or any other effects processing you can imagine). That doesn't make these hand-off issues any easier. This synchronization/gap problem applies at the application layer to both options 2 and 3 equally. I.e., if you're doing any non-trivial processing (in a ProcessedMediaStream or otherwise), you'll have to be very careful not to introduce these problems when swapping in a new MediaStream object, either constructed by the user to enforce synchronization in 2 or constructed by the API to enforce same-CNAME semantics in 3. In 3b you'll have them at the browser layer as well. They're compounded in both 3a and 3b by the fact that the remove callback is separate from the add callback. If you don't know an add callback is coming, you may continue processing things right after the remove, introducing these gaps. 3a may be slightly better in this regards, as the media will keep playing if you can somehow divine that you should ignore the remove callback, but is still not without issues. Option 3 also doesn't side-step the "no API to indicate CNAME" problem entirely, as we may still run into that issue if audio and video have to be part of separate RTP sessions. So, that's as far as I've thought through these things right now. What do others think?
Received on Wednesday, 5 October 2011 19:00:09 UTC