Re: use cases around CSRC, client to mixer, mixer to client from Roman Shpount on 2015-09-04 (public-webrtc@w3.org from September 2015)

From: Roman Shpount <roman@telurix.com>
Date: Fri, 4 Sep 2015 13:35:59 -0400
To: "Cullen Jennings (fluffy)" <fluffy@cisco.com>
Cc: public-webrtc <public-webrtc@w3.org>, Bernard Aboba <Bernard.Aboba@microsoft.com>
Message-ID: <CAD5OKxt7VWyrzrEzaMCuxxZ1bOZpW0vOersnEeQMhJyuLsmpYw@mail.gmail.com>

On Fri, Sep 4, 2015 at 1:09 PM, Cullen Jennings (fluffy) <fluffy@cisco.com>
wrote:

>
> Use Case for JS to Read CSRC :
>
> So consider the case of an audio conferences where the audio bridge or MCU
> receives the audio from all participants but then selects some subset of
> the active speakers and mixes them into a single audio stream that is send
> out to the non active speakers. This is the most common form of
> conferencing today and reduces bandwidth over solutions that send the
> unmixed audio for each active speaker. Say Alice and Bob are the active
> speakers, the conference bridge takes the audio and mixed them and sends it
> but it indicates in the sent RTP packets the SSRC of Alice and Bob by
> putting those two SSRC into the CSRC list for the outbound RTP packet.
>
> The JS app out of band gets the SSRC and name of each user as they join.
> When it receives this RTP packet, it can look at the CSRC (if we have an
> API for that) and visually show in the roster list for the app that Alice
> and Bob are both currently speaking.
>
> The changes in the roster list need to be synchronized with the audio. So
> if three people say in sequence Yes, No, Yes, the roster should be
> displaying the name of the correct person as each person speaks. This
> allows people that don't recognize the voices to see who said yes and who
> said no.  That implies UI and audio synchronization timing requirements in
> the order of 100ms. Solutions that work by having the MCU tell the web
> server who the active speaker is, then the web server tells the GUI over
> websockets or something have not been able to reliably achieve a good user
> experience on this.  Solutions that look at the CSRC lists of the RTP being
> received easily meet that type of timing requirement.
>
> I think this use case is implementable with the GUI simply polling the
> current list of CSRC periodically (say every 50 ms) and updating the GUI if
> things have changed.
>

This can also be achieved using data channels with mixer sending the
speaker change notifications over the a data channel connection. Based on
my experience this works quite well and satisfies the timing requirements
for this use case.

Use Case for receiving Mixer to Client audio levels:
>
> Here is one related use use case - you have a multi user voice/video chat
> app for say 3 to 7 people in the same conference. It uses isolated media
> for privacy reasons and also for privacy reasons does not have a central
> mixer but instead creates a full mesh  of connection so each participant
> sends media to all other participants. Each participant plays all the audio
> of all participants but only display the video of the most resent person
> that started talking. The JS could look at the received client to mixer
> level to of the audio to decide what video to show. However this is looking
> at the client to mixer value not the mixer to client value.
>

This can also be implemented in the client JS application by analyzing
audio which is being sent for playback. I do not think there are a lot of
benefits in looking at the Client-to-mixer audio levels vs the audio itself.
_____________
Roman Shpount

Received on Friday, 4 September 2015 17:36:29 UTC