Re: active speaker information in mixed streams

Yes, it's pretty low-level.  For this particular use case, what you
have is better, although I'm not sure I'd like calling it "MixerInfo".
 How about just calling them "contributing source"s?

dictionary RtpContributingSource {
  unsigned int csrc;
  int audioLevel;
}

partial interface RtpReceiver {
  sequence<RtpContributingSource> getContributingSources();
}


Also, is it enough to require JS to poll?  Why not have an event for
when the values change?

partial interface RtpReceiver {
   // Gets sequence<RtpContributingSource>
   attribute EventHandler? oncontributingsources;
}


Even so, would it still be worth it to have low-level header extension
access?  It might be handy when an application wants a proprietary
header extension sent from their "mixer".  On the other hand, one
could probably just use the data channel, like I suggested earlier :).

By the way, the ease at which you put this on the RtpReceiver does
show what an advantage it is to have it.


On Tue, Jan 28, 2014 at 4:21 PM, Justin Uberti <juberti@google.com> wrote:
> Having to mine through the raw packets feels like a pretty low-level API to
> me.
>
> I was thinking that one could interrogate the RtpReceiver object to get data
> on the most recently seen CSRCs and their corresponding energy levels.
> Something like
>
> dictionary RtpCsrcInfo {
>   unsigned int csrc;
>   int audioLevel;
> }
>
> dictionary RtpMixerInfo {
>   sequence<RtpCsrcInfo> csrcs;
> }
>
> partial interface RtpReceiver {
>   RtpMixerInfo getMixerInfo();
> }
>
> or maybe just return a dictionary with CSRC as keys and energy levels as
> values.
>
>
> On Tue, Jan 28, 2014 at 3:27 PM, Peter Thatcher <pthatcher@google.com>
> wrote:
>>
>> I think it would be reasonable to add some access to header extensions
>> and CSRCs in the RtpReceiver object.
>>
>>
>> Would it make sense to have a general access to such things by having
>> general access to receive packets?  It could be used like so:
>>
>> var receiver = new RtpReceiver(...);
>> receiver.onpackets = function(packets) {
>>   for (var i = 0; i < packets.length; i++) {
>>     var packet = packets[i];
>>     // Here you have access to
>>     // packet.csrcs
>>     // packet.headerExtensions
>>   }
>> }
>>
>> And defined like so:
>>
>> partial interface RtpReceiver {
>>   // Gives a sequence of RtpPacket
>>   // Fired in "batches" of packets.
>>   attribute EventHandler? onpackets;
>> }
>>
>> dictionary RtpPacket {
>>   sequence<unsigned int> csrcs;
>>   sequence<RtpHeaderExtension> headerExtensions;
>> }
>>
>> dictionary RtpHeaderExtension {
>>   unsigned short id;
>>   ArrayBuffer value;
>> }
>>
>>
>> That might leave a bit of work for you to build on top of, but it
>> would solve the "can I access header extension" issue once and for
>> all.
>>
>> Would this meet your needs?
>>
>>
>>
>> On Tue, Jan 28, 2014 at 2:51 PM, Emil Ivov <emcho@jitsi.org> wrote:
>> > On Tue, Jan 28, 2014 at 11:41 PM, Peter Thatcher <pthatcher@google.com>
>> > wrote:
>> >> I guess it could continue in both.  The ORCA  CG might be quicker to
>> >> integrate something into the API than the WebRTC WG.
>> >>
>> >> My question is the same: exactly what info do you want available in
>> >> the JS?  The CSRCs?
>> >
>> > Same answer then: That would be CSRCs and/or audio level header
>> > extensions as per RFC6465.
>> >
>> > Emil
>> >
>> > --
>> > https://jitsi.org
>> >
>> >> On Tue, Jan 28, 2014 at 2:38 PM, Emil Ivov <emcho@jitsi.org> wrote:
>> >>> I am not sure whether this discussion should only continue on one of
>> >>> the lists but until we figure that out I am going to answer here as
>> >>> well
>> >>>
>> >>> Sync isn't really the issue here. It's mostly about the fact that the
>> >>> mixer is not a WebRTC entity. This means that it most likely doesn't
>> >>> even know what SCTP is, it doesn't necessarily have access to
>> >>> signalling and above all, the mix is likely to also contain audio from
>> >>> non-webrtc endpoints. Using DataChannels in such situations would
>> >>> likely turn out to be quite convoluted.
>> >>>
>> >>> Emil
>> >>>
>> >>> On Tue, Jan 28, 2014 at 10:18 PM, Peter Thatcher
>> >>> <pthatcher@google.com> wrote:
>> >>>> Over there, I suggested that you could simply send the audio levels
>> >>>> over an unordered data channel.  If you're using one
>> >>>> IceTransport/DtlsTransport pair for both your RTP and SCTP, it would
>> >>>> probably stay very closely in sync.
>> >>>>
>> >>>> On Tue, Jan 28, 2014 at 5:44 AM, Emil Ivov <emcho@jitsi.org> wrote:
>> >>>>> Hey all,
>> >>>>>
>> >>>>> I just posted this to the WebRTC list here:
>> >>>>>
>> >>>>> http://lists.w3.org/Archives/Public/public-webrtc/2014Jan/0256.html
>> >>>>>
>> >>>>> But I believe it's a question that is also very much worth resolving
>> >>>>> for ORTC, so I am also asking it here:
>> >>>>>
>> >>>>> One requirement that we often bump against is the possibility to
>> >>>>> extract active speaker information from an incoming *mixed* audio
>> >>>>> stream. Acquiring the CSRC list from RTP would be a good start.
>> >>>>> Audio
>> >>>>> levels as per RFC6465 would be even better.
>> >>>>>
>> >>>>> Thoughts?
>> >>>>>
>> >>>>> Emil
>> >>>
>> >>> --
>> >>> https://jitsi.org
>> >
>> >
>> >
>> > --
>> > Emil Ivov, Ph.D.                       67000 Strasbourg,
>> > Project Lead                           France
>> > Jitsi
>> > emcho@jitsi.org                        PHONE: +33.1.77.62.43.30
>> > https://jitsi.org                       FAX:   +33.1.77.62.47.31
>>
>

Received on Wednesday, 29 January 2014 00:57:25 UTC