Re: active speaker information in mixed streams from Peter Thatcher on 2014-01-29 (public-orca@w3.org from January 2014)

From: Peter Thatcher <pthatcher@google.com>
Date: Tue, 28 Jan 2014 17:26:37 -0800
To: Roman Shpount <rshpount@turbobridge.com>
Cc: Justin Uberti <juberti@google.com>, Emil Ivov <emcho@jitsi.org>, "public-orca@w3.org" <public-orca@w3.org>
Message-ID: <CAJrXDUFfkHGTFu2dLYe+rPOmjYpOPGEBh4SH4O0a38MTmcpKwQ@mail.gmail.com>
On Tue, Jan 28, 2014 at 5:21 PM, Roman Shpount <rshpount@turbobridge.com> wrote:
> Would it make more sense to generalize a RtpContributingSource to define a
> list of RTP header extensions and trigger an event every time the value set
> changes:
>
> dictionary RtpHeaderExtension {
>   unsigned short id;
>   ArrayBuffer value;
> }
>
> dictionary RtpContributingSource {
>   unsigned int csrc;
>   sequence<RtpHeaderExtension> headerExtensions;
> }
>
> This way it is not limited to audio level only.
>

Like Justin said, it's getting quite low-level at that point.  It's
not much different than my "give JS access to every packet" event.

> This being said, the only problem I see with all of this is that there are
> scenarios (like audio level) when this event will be triggered for every
> packet. This will not scale for server side applications of orca.
>

Since we only care about the latest values, can't we just throttle how
often the event is fired?  Say, every 200ms?

> _____________
> Roman Shpount
>
>
> On Tue, Jan 28, 2014 at 7:56 PM, Peter Thatcher <pthatcher@google.com>
> wrote:
>>
>> Yes, it's pretty low-level.  For this particular use case, what you
>> have is better, although I'm not sure I'd like calling it "MixerInfo".
>>  How about just calling them "contributing source"s?
>>
>> dictionary RtpContributingSource {
>>   unsigned int csrc;
>>   int audioLevel;
>> }
>>
>> partial interface RtpReceiver {
>>   sequence<RtpContributingSource> getContributingSources();
>> }
>>
>>
>> Also, is it enough to require JS to poll?  Why not have an event for
>> when the values change?
>>
>> partial interface RtpReceiver {
>>    // Gets sequence<RtpContributingSource>
>>    attribute EventHandler? oncontributingsources;
>> }
>>
>>
>> Even so, would it still be worth it to have low-level header extension
>> access?  It might be handy when an application wants a proprietary
>> header extension sent from their "mixer".  On the other hand, one
>> could probably just use the data channel, like I suggested earlier :).
>>
>> By the way, the ease at which you put this on the RtpReceiver does
>> show what an advantage it is to have it.
>>
>>
>> On Tue, Jan 28, 2014 at 4:21 PM, Justin Uberti <juberti@google.com> wrote:
>> > Having to mine through the raw packets feels like a pretty low-level API
>> > to
>> > me.
>> >
>> > I was thinking that one could interrogate the RtpReceiver object to get
>> > data
>> > on the most recently seen CSRCs and their corresponding energy levels.
>> > Something like
>> >
>> > dictionary RtpCsrcInfo {
>> >   unsigned int csrc;
>> >   int audioLevel;
>> > }
>> >
>> > dictionary RtpMixerInfo {
>> >   sequence<RtpCsrcInfo> csrcs;
>> > }
>> >
>> > partial interface RtpReceiver {
>> >   RtpMixerInfo getMixerInfo();
>> > }
>> >
>> > or maybe just return a dictionary with CSRC as keys and energy levels as
>> > values.
>> >
>> >
>> > On Tue, Jan 28, 2014 at 3:27 PM, Peter Thatcher <pthatcher@google.com>
>> > wrote:
>> >>
>> >> I think it would be reasonable to add some access to header extensions
>> >> and CSRCs in the RtpReceiver object.
>> >>
>> >>
>> >> Would it make sense to have a general access to such things by having
>> >> general access to receive packets?  It could be used like so:
>> >>
>> >> var receiver = new RtpReceiver(...);
>> >> receiver.onpackets = function(packets) {
>> >>   for (var i = 0; i < packets.length; i++) {
>> >>     var packet = packets[i];
>> >>     // Here you have access to
>> >>     // packet.csrcs
>> >>     // packet.headerExtensions
>> >>   }
>> >> }
>> >>
>> >> And defined like so:
>> >>
>> >> partial interface RtpReceiver {
>> >>   // Gives a sequence of RtpPacket
>> >>   // Fired in "batches" of packets.
>> >>   attribute EventHandler? onpackets;
>> >> }
>> >>
>> >> dictionary RtpPacket {
>> >>   sequence<unsigned int> csrcs;
>> >>   sequence<RtpHeaderExtension> headerExtensions;
>> >> }
>> >>
>> >> dictionary RtpHeaderExtension {
>> >>   unsigned short id;
>> >>   ArrayBuffer value;
>> >> }
>> >>
>> >>
>> >> That might leave a bit of work for you to build on top of, but it
>> >> would solve the "can I access header extension" issue once and for
>> >> all.
>> >>
>> >> Would this meet your needs?
>> >>
>> >>
>> >>
>> >> On Tue, Jan 28, 2014 at 2:51 PM, Emil Ivov <emcho@jitsi.org> wrote:
>> >> > On Tue, Jan 28, 2014 at 11:41 PM, Peter Thatcher
>> >> > <pthatcher@google.com>
>> >> > wrote:
>> >> >> I guess it could continue in both.  The ORCA  CG might be quicker to
>> >> >> integrate something into the API than the WebRTC WG.
>> >> >>
>> >> >> My question is the same: exactly what info do you want available in
>> >> >> the JS?  The CSRCs?
>> >> >
>> >> > Same answer then: That would be CSRCs and/or audio level header
>> >> > extensions as per RFC6465.
>> >> >
>> >> > Emil
>> >> >
>> >> > --
>> >> > https://jitsi.org
>> >> >
>> >> >> On Tue, Jan 28, 2014 at 2:38 PM, Emil Ivov <emcho@jitsi.org> wrote:
>> >> >>> I am not sure whether this discussion should only continue on one
>> >> >>> of
>> >> >>> the lists but until we figure that out I am going to answer here as
>> >> >>> well
>> >> >>>
>> >> >>> Sync isn't really the issue here. It's mostly about the fact that
>> >> >>> the
>> >> >>> mixer is not a WebRTC entity. This means that it most likely
>> >> >>> doesn't
>> >> >>> even know what SCTP is, it doesn't necessarily have access to
>> >> >>> signalling and above all, the mix is likely to also contain audio
>> >> >>> from
>> >> >>> non-webrtc endpoints. Using DataChannels in such situations would
>> >> >>> likely turn out to be quite convoluted.
>> >> >>>
>> >> >>> Emil
>> >> >>>
>> >> >>> On Tue, Jan 28, 2014 at 10:18 PM, Peter Thatcher
>> >> >>> <pthatcher@google.com> wrote:
>> >> >>>> Over there, I suggested that you could simply send the audio
>> >> >>>> levels
>> >> >>>> over an unordered data channel.  If you're using one
>> >> >>>> IceTransport/DtlsTransport pair for both your RTP and SCTP, it
>> >> >>>> would
>> >> >>>> probably stay very closely in sync.
>> >> >>>>
>> >> >>>> On Tue, Jan 28, 2014 at 5:44 AM, Emil Ivov <emcho@jitsi.org>
>> >> >>>> wrote:
>> >> >>>>> Hey all,
>> >> >>>>>
>> >> >>>>> I just posted this to the WebRTC list here:
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> http://lists.w3.org/Archives/Public/public-webrtc/2014Jan/0256.html
>> >> >>>>>
>> >> >>>>> But I believe it's a question that is also very much worth
>> >> >>>>> resolving
>> >> >>>>> for ORTC, so I am also asking it here:
>> >> >>>>>
>> >> >>>>> One requirement that we often bump against is the possibility to
>> >> >>>>> extract active speaker information from an incoming *mixed* audio
>> >> >>>>> stream. Acquiring the CSRC list from RTP would be a good start.
>> >> >>>>> Audio
>> >> >>>>> levels as per RFC6465 would be even better.
>> >> >>>>>
>> >> >>>>> Thoughts?
>> >> >>>>>
>> >> >>>>> Emil
>> >> >>>
>> >> >>> --
>> >> >>> https://jitsi.org
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Emil Ivov, Ph.D.                       67000 Strasbourg,
>> >> > Project Lead                           France
>> >> > Jitsi
>> >> > emcho@jitsi.org                        PHONE: +33.1.77.62.43.30
>> >> > https://jitsi.org                       FAX:   +33.1.77.62.47.31
>> >>
>> >
>>
>
Received on Wednesday, 29 January 2014 01:27:45 UTC