Mapping multiple media sources to few or many M-lines from Harald Alvestrand on 2012-10-24 (public-webrtc@w3.org from October 2012)

From: Harald Alvestrand <harald@alvestrand.no>
Date: Wed, 24 Oct 2012 16:49:52 +0200
To: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <50880010.2070503@alvestrand.no>
Cullen, Justin and I have been working on a proposal for this issue, 
which is one of the things we have to settle in order to know what we're 
designing our signalling for.

One proposal is outlined below - it depends on defining a new attribute 
of a MediaStreamTrack called "content", and using that to direct tracks 
onto different M-lines when negotiating SDP.

Details below.

==Proposal for controlling the allocation of multiple media sources to 
RTP sessions==

===Problem description===

There are a number of applications that can be envisioned using WebRTC. 
The applications where one audio and one video stream is connected 
between two participants are trivial; there is no real controversy 
there. But other styles are more difficult.
Two important cases are:
A single PeerConnection is used to connect an end-user to a central 
non-mixing MCU (a "RTCP-terminating MCU" in RFC 5117 terminology) and 
the connection between the MCU and the user has a large number of audio 
and/or video tracks (for example, a “thumbnail strip” + one or more 
large video images).
A single PeerConnection is used to connect an end-user to a non-RTCWEB 
SIP system, through a signalling gateway but not through a media 
gateway, using multiple video sessions that are distinguished by use of 
the “a=content” attribute (for example, a main video feed plus a 
presentation video feed).

In the first case, we definitely want all the video sources in the same 
RTP session, which helps us be able to add or remove video sources with 
minimal overhead (no new ICE ports and NAT pinholes).

In the second case, we want to have specific video sources on different 
RTP sessions, and we want to have exact control over which video streams 
get assigned to what RTP sessions.
Solution Description


The basic idea is to expose a new "content" property on 
MediaStreamTracks, as defined in RFC 4796, which would indicate the 
"usage" of the media in that particular track. When createOffer is 
called to create a session description, it will include a m= line for 
each [media, content] tuple that exists within the list of attached 
MediaStreamTracks.

Since normally m= lines are omitted for tuples that have no associated 
MediaStreamTracks, the application can also include an empty m= line for 
a given tuple by specifying a constraint to createOffer, similar to how 
the existing OfferToReceiveAudio and OfferToReceiveVideo can be used to 
add empty m= lines for audio and video. The suggested form for this 
constraint is to use the existing OfferToReceiveAudio and 
OfferToReceiveVideo keys, but use the content property as the value, 
e.g. "OfferToReceiveVideo:slides".

Individual MediaStreamTracks are represented via a=ssrc attributes on 
the appropriate m= lines; the MSID attribute on the a=ssrc line 
identifies the MediaStreamTrack. There can be an arbitrary number of 
MediaStreamTracks associated with a given m= line, including zero; 
demuxing of these MediaStreamTracks is performed done according to the 
SSRC specified with the a=ssrc attribute.

By default, the content property for MediaStreamTracks is left empty. 
This means that MediaStreamTracks are by default associated with m= 
lines that have no a=content attribute.

createAnswer works the same way as createOffer, using its attached 
MediaStreamTracks, and the constraints supplied; note that it will 
always include m= lines as needed to match the offer, even if no 
MediaStreamTracks are attached.

Through this mechanism, applications that want to make use of multiple 
media streams can generate SDP that best matches what existing 
videoconferencing equipment expects, but this usage is not required; 
sophisticated applications can use the content property to assign their 
own grouping of MediaStreamTracks to m= lines, including the creation of 
individual m= lines for each MediaStreamTrack, or the combination of all 
video MediaStreamTracks into a single m= line. Of course, these 
applications could also do so by generating the SDP themselves and 
passing this SDP into setLocalDescription.

===Examples===

MediaStream ms1 contains an audio track (denoted a0 in msid lines) and 
video track (v0), as obtained from getUserMedia(). The label of ms1 is 
<ms1.label>.
MediaStream ms2 contains a single video track (also denoted v0, since 
it’s the first video track in its mediastream), taken from the desktop. 
The label of ms2 is <ms2.label>.
PeerConnection pc exists, with no streams attached.
pc.addStream(ms1, null);
pc.createOffer(null);

produces:

<blah>
m=audio
a=ssrc:1234 msid:<ms1.label> a0
m=video // nothing fancy
a=ssrc:5678 msid:<ms1.label> v0
pc.addStream(ms1, null);
pc.addStream(ms2, null);
pc.createOffer(null);

produces:

<blah>
m=audio
a=ssrc:1234 msid:<ms1.label> a0
m=video // both tracks associated with one m= line
a=ssrc:5678 msid:<ms1.label> v0
a=ssrc:6789 msid:<ms2.label> v0
pc.addStream(ms1, null);
pc.createOffer({mandatory:{"OfferToReceiveVideo:slides"}});

produces:

<blah>
m=audio
a=ssrc:1234 msid:<ms1.label> a0
m=video
a=ssrc:5678 msid:<ms1.label> v0
m=video // note this empty m= line
a=content:slides
pc.addStream(ms1, null);
v2.content = "slides";
pc.addStream(ms2, null);
pc.createOffer(null);

produces:

<blah>
m=audio
a=ssrc:1234 msid:<ms1.label> a0
m=video
a=ssrc:5678 msid:<ms1.label> v0
m=video // this m= line has an a=content attribute and a track
a=content:slides
a=ssrc:6789 msid:<ms2.label> v0
Received on Wednesday, 24 October 2012 14:50:27 UTC