Re: use cases around CSRC, client to mixer, mixer to client from Harald Alvestrand on 2015-09-06 (public-webrtc@w3.org from September 2015)

From: Harald Alvestrand <harald@alvestrand.no>
Date: Sun, 06 Sep 2015 18:58:17 +0200
To: public-webrtc@w3.org
Message-ID: <55EC70A9.4060309@alvestrand.no>
On 09/04/2015 07:09 PM, Cullen Jennings (fluffy) wrote:
> Let me just hit the high level use cases for each of these:
>
>
>
> Use Case for Client to Mixer audio levels:
>
> We want to build a conference where the central bridge decides what media to forward to which clients but does not decrypt or decode the media. It might not decrypt because it does not have the keys for privacy reasons or it might not decrypt just for scalability reasons of reducing CPU usage. To work it needs the the browser to the the 

I'm assuming that this means "To work, it needs the browser to send and
not encrypt the RTP client to mixer extension". I don't think we have an
RTP model for "encrypt the data to this key and encrypt this header
extension to that other key".
> RTP client to mixer header extension. Note this use case does not mean we need an API to controls this as the JS level if the default is to try and negotiate it and send it if the other side accepts it. 

Actually I'm a bit worried about this, because it effectively means
audio level info goes in the clear for this case to work.

>
>
>
> Use Case for JS to Read CSRC :
>
> So consider the case of an audio conferences where the audio bridge or MCU receives the audio from all participants but then selects some subset of the active speakers and mixes them into a single audio stream that is send out to the non active speakers. This is the most common form of conferencing today and reduces bandwidth over solutions that send the unmixed audio for each active speaker. Say Alice and Bob are the active speakers, the conference bridge takes the audio and mixed them and sends it but it indicates in the sent RTP packets the SSRC of Alice and Bob by putting those two SSRC into the CSRC list for the outbound RTP packet. 
>
> The JS app out of band gets the SSRC and name of each user as they join. When it receives this RTP packet, it can look at the CSRC (if we have an API for that) and visually show in the roster list for the app that Alice and Bob are both currently speaking. 
>
> The changes in the roster list need to be synchronized with the audio. So if three people say in sequence Yes, No, Yes, the roster should be displaying the name of the correct person as each person speaks. This allows people that don't recognize the voices to see who said yes and who said no.  That implies UI and audio synchronization timing requirements in the order of 100ms. Solutions that work by having the MCU tell the web server who the active speaker is, then the web server tells the GUI over websockets or something have not been able to reliably achieve a good user experience on this.  Solutions that look at the CSRC lists of the RTP being received easily meet that type of timing requirement. 
What's the timing impact of letting the MCU tell the JS about the change
over the data channel?
>
> I think this use case is implementable with the GUI simply polling the current list of CSRC periodically (say every 50 ms) and updating the GUI if things have changed. 
That's 20 polls per second.
>
>
>
>
> Use Case for receiving Mixer to Client audio levels:
>
> I'm not really sure the good use case for this and I don't care if we have it in 1.0 - perhaps someone else has more compelling use case. 
>
> Here is one related use use case - you have a multi user voice/video chat app for say 3 to 7 people in the same conference. It uses isolated media for privacy reasons and also for privacy reasons does not have a central mixer but instead creates a full mesh  of connection so each participant sends media to all other participants. Each participant plays all the audio of all participants but only display the video of the most resent person that started talking. The JS could look at the received client to mixer level to of the audio to decide what video to show. However this is looking at the client to mixer value not the mixer to client value. 
>
>
>
>
>
>
>
>


-- 
Surveillance is pervasive. Go Dark.
Received on Sunday, 6 September 2015 16:58:47 UTC