- From: Harald Alvestrand <harald@alvestrand.no>
- Date: Wed, 24 Oct 2012 16:49:52 +0200
- To: "public-webrtc@w3.org" <public-webrtc@w3.org>
Cullen, Justin and I have been working on a proposal for this issue, which is one of the things we have to settle in order to know what we're designing our signalling for. One proposal is outlined below - it depends on defining a new attribute of a MediaStreamTrack called "content", and using that to direct tracks onto different M-lines when negotiating SDP. Details below. ==Proposal for controlling the allocation of multiple media sources to RTP sessions== ===Problem description=== There are a number of applications that can be envisioned using WebRTC. The applications where one audio and one video stream is connected between two participants are trivial; there is no real controversy there. But other styles are more difficult. Two important cases are: A single PeerConnection is used to connect an end-user to a central non-mixing MCU (a "RTCP-terminating MCU" in RFC 5117 terminology) and the connection between the MCU and the user has a large number of audio and/or video tracks (for example, a “thumbnail strip” + one or more large video images). A single PeerConnection is used to connect an end-user to a non-RTCWEB SIP system, through a signalling gateway but not through a media gateway, using multiple video sessions that are distinguished by use of the “a=content” attribute (for example, a main video feed plus a presentation video feed). In the first case, we definitely want all the video sources in the same RTP session, which helps us be able to add or remove video sources with minimal overhead (no new ICE ports and NAT pinholes). In the second case, we want to have specific video sources on different RTP sessions, and we want to have exact control over which video streams get assigned to what RTP sessions. Solution Description The basic idea is to expose a new "content" property on MediaStreamTracks, as defined in RFC 4796, which would indicate the "usage" of the media in that particular track. When createOffer is called to create a session description, it will include a m= line for each [media, content] tuple that exists within the list of attached MediaStreamTracks. Since normally m= lines are omitted for tuples that have no associated MediaStreamTracks, the application can also include an empty m= line for a given tuple by specifying a constraint to createOffer, similar to how the existing OfferToReceiveAudio and OfferToReceiveVideo can be used to add empty m= lines for audio and video. The suggested form for this constraint is to use the existing OfferToReceiveAudio and OfferToReceiveVideo keys, but use the content property as the value, e.g. "OfferToReceiveVideo:slides". Individual MediaStreamTracks are represented via a=ssrc attributes on the appropriate m= lines; the MSID attribute on the a=ssrc line identifies the MediaStreamTrack. There can be an arbitrary number of MediaStreamTracks associated with a given m= line, including zero; demuxing of these MediaStreamTracks is performed done according to the SSRC specified with the a=ssrc attribute. By default, the content property for MediaStreamTracks is left empty. This means that MediaStreamTracks are by default associated with m= lines that have no a=content attribute. createAnswer works the same way as createOffer, using its attached MediaStreamTracks, and the constraints supplied; note that it will always include m= lines as needed to match the offer, even if no MediaStreamTracks are attached. Through this mechanism, applications that want to make use of multiple media streams can generate SDP that best matches what existing videoconferencing equipment expects, but this usage is not required; sophisticated applications can use the content property to assign their own grouping of MediaStreamTracks to m= lines, including the creation of individual m= lines for each MediaStreamTrack, or the combination of all video MediaStreamTracks into a single m= line. Of course, these applications could also do so by generating the SDP themselves and passing this SDP into setLocalDescription. ===Examples=== MediaStream ms1 contains an audio track (denoted a0 in msid lines) and video track (v0), as obtained from getUserMedia(). The label of ms1 is <ms1.label>. MediaStream ms2 contains a single video track (also denoted v0, since it’s the first video track in its mediastream), taken from the desktop. The label of ms2 is <ms2.label>. PeerConnection pc exists, with no streams attached. pc.addStream(ms1, null); pc.createOffer(null); produces: <blah> m=audio a=ssrc:1234 msid:<ms1.label> a0 m=video // nothing fancy a=ssrc:5678 msid:<ms1.label> v0 pc.addStream(ms1, null); pc.addStream(ms2, null); pc.createOffer(null); produces: <blah> m=audio a=ssrc:1234 msid:<ms1.label> a0 m=video // both tracks associated with one m= line a=ssrc:5678 msid:<ms1.label> v0 a=ssrc:6789 msid:<ms2.label> v0 pc.addStream(ms1, null); pc.createOffer({mandatory:{"OfferToReceiveVideo:slides"}}); produces: <blah> m=audio a=ssrc:1234 msid:<ms1.label> a0 m=video a=ssrc:5678 msid:<ms1.label> v0 m=video // note this empty m= line a=content:slides pc.addStream(ms1, null); v2.content = "slides"; pc.addStream(ms2, null); pc.createOffer(null); produces: <blah> m=audio a=ssrc:1234 msid:<ms1.label> a0 m=video a=ssrc:5678 msid:<ms1.label> v0 m=video // this m= line has an a=content attribute and a track a=content:slides a=ssrc:6789 msid:<ms2.label> v0
Received on Wednesday, 24 October 2012 14:50:27 UTC