- From: Harald Alvestrand <harald@alvestrand.no>
- Date: Wed, 24 Oct 2012 16:49:52 +0200
- To: "public-webrtc@w3.org" <public-webrtc@w3.org>
Cullen, Justin and I have been working on a proposal for this issue,
which is one of the things we have to settle in order to know what we're
designing our signalling for.
One proposal is outlined below - it depends on defining a new attribute
of a MediaStreamTrack called "content", and using that to direct tracks
onto different M-lines when negotiating SDP.
Details below.
==Proposal for controlling the allocation of multiple media sources to
RTP sessions==
===Problem description===
There are a number of applications that can be envisioned using WebRTC.
The applications where one audio and one video stream is connected
between two participants are trivial; there is no real controversy
there. But other styles are more difficult.
Two important cases are:
A single PeerConnection is used to connect an end-user to a central
non-mixing MCU (a "RTCP-terminating MCU" in RFC 5117 terminology) and
the connection between the MCU and the user has a large number of audio
and/or video tracks (for example, a “thumbnail strip” + one or more
large video images).
A single PeerConnection is used to connect an end-user to a non-RTCWEB
SIP system, through a signalling gateway but not through a media
gateway, using multiple video sessions that are distinguished by use of
the “a=content” attribute (for example, a main video feed plus a
presentation video feed).
In the first case, we definitely want all the video sources in the same
RTP session, which helps us be able to add or remove video sources with
minimal overhead (no new ICE ports and NAT pinholes).
In the second case, we want to have specific video sources on different
RTP sessions, and we want to have exact control over which video streams
get assigned to what RTP sessions.
Solution Description
The basic idea is to expose a new "content" property on
MediaStreamTracks, as defined in RFC 4796, which would indicate the
"usage" of the media in that particular track. When createOffer is
called to create a session description, it will include a m= line for
each [media, content] tuple that exists within the list of attached
MediaStreamTracks.
Since normally m= lines are omitted for tuples that have no associated
MediaStreamTracks, the application can also include an empty m= line for
a given tuple by specifying a constraint to createOffer, similar to how
the existing OfferToReceiveAudio and OfferToReceiveVideo can be used to
add empty m= lines for audio and video. The suggested form for this
constraint is to use the existing OfferToReceiveAudio and
OfferToReceiveVideo keys, but use the content property as the value,
e.g. "OfferToReceiveVideo:slides".
Individual MediaStreamTracks are represented via a=ssrc attributes on
the appropriate m= lines; the MSID attribute on the a=ssrc line
identifies the MediaStreamTrack. There can be an arbitrary number of
MediaStreamTracks associated with a given m= line, including zero;
demuxing of these MediaStreamTracks is performed done according to the
SSRC specified with the a=ssrc attribute.
By default, the content property for MediaStreamTracks is left empty.
This means that MediaStreamTracks are by default associated with m=
lines that have no a=content attribute.
createAnswer works the same way as createOffer, using its attached
MediaStreamTracks, and the constraints supplied; note that it will
always include m= lines as needed to match the offer, even if no
MediaStreamTracks are attached.
Through this mechanism, applications that want to make use of multiple
media streams can generate SDP that best matches what existing
videoconferencing equipment expects, but this usage is not required;
sophisticated applications can use the content property to assign their
own grouping of MediaStreamTracks to m= lines, including the creation of
individual m= lines for each MediaStreamTrack, or the combination of all
video MediaStreamTracks into a single m= line. Of course, these
applications could also do so by generating the SDP themselves and
passing this SDP into setLocalDescription.
===Examples===
MediaStream ms1 contains an audio track (denoted a0 in msid lines) and
video track (v0), as obtained from getUserMedia(). The label of ms1 is
<ms1.label>.
MediaStream ms2 contains a single video track (also denoted v0, since
it’s the first video track in its mediastream), taken from the desktop.
The label of ms2 is <ms2.label>.
PeerConnection pc exists, with no streams attached.
pc.addStream(ms1, null);
pc.createOffer(null);
produces:
<blah>
m=audio
a=ssrc:1234 msid:<ms1.label> a0
m=video // nothing fancy
a=ssrc:5678 msid:<ms1.label> v0
pc.addStream(ms1, null);
pc.addStream(ms2, null);
pc.createOffer(null);
produces:
<blah>
m=audio
a=ssrc:1234 msid:<ms1.label> a0
m=video // both tracks associated with one m= line
a=ssrc:5678 msid:<ms1.label> v0
a=ssrc:6789 msid:<ms2.label> v0
pc.addStream(ms1, null);
pc.createOffer({mandatory:{"OfferToReceiveVideo:slides"}});
produces:
<blah>
m=audio
a=ssrc:1234 msid:<ms1.label> a0
m=video
a=ssrc:5678 msid:<ms1.label> v0
m=video // note this empty m= line
a=content:slides
pc.addStream(ms1, null);
v2.content = "slides";
pc.addStream(ms2, null);
pc.createOffer(null);
produces:
<blah>
m=audio
a=ssrc:1234 msid:<ms1.label> a0
m=video
a=ssrc:5678 msid:<ms1.label> v0
m=video // this m= line has an a=content attribute and a track
a=content:slides
a=ssrc:6789 msid:<ms2.label> v0
Received on Wednesday, 24 October 2012 14:50:27 UTC