Re: active speaker information in mixed streams from Justin Uberti on 2014-01-29 (public-orca@w3.org from January 2014)

From: Justin Uberti <juberti@google.com>
Date: Tue, 28 Jan 2014 17:53:32 -0800
To: Roman Shpount <rshpount@turbobridge.com>
Cc: Peter Thatcher <pthatcher@google.com>, Emil Ivov <emcho@jitsi.org>, "public-orca@w3.org" <public-orca@w3.org>
Message-ID: <CAOJ7v-1yVNWxe6FU8TGMADHqnXda6Z3FbUv9h0Kegygripzd5A@mail.gmail.com>
Emil, does Peter's suggestion work for you?

I don't want ORCA to have to solve *all* the things we couldn't solve in
1.0, so if we can't find a simple solution, I am inclined to leave this for
later.


On Tue, Jan 28, 2014 at 5:51 PM, Roman Shpount <rshpount@turbobridge.com>wrote:

> First of all, the latest value of audio level is almost useless. You need
> to apply some sort of averaging function to the audio level values you
> received to get something that make sense (see section 5 of RFC 6464). For
> instance, returning a max audio level for the specified interval, which
> should be much longer then an individual packet duration makes much more
> sense.
>
> Second, since scenarios were received audio will not be decoded would be
> very uncommon for orca clients, saving from exposing audio level from RTP
> packets are not significant in comparison with calculating this value
> directly from decoded audio.
>
> As far ssrcs are concerned it would make sense to expose the latest list
> of contributing sources with some sort of time stamp indicating the last
> time each ssrc was seen. You can also expire and remove ssrcs from the list
> after some period of time.
>
> _____________
> Roman Shpount
>
>
> On Tue, Jan 28, 2014 at 8:32 PM, Peter Thatcher <pthatcher@google.com>wrote:
>
>> Polling is fine with me.  What about calling it RtpContriburingSource?
>>  Do you prefer that or MixerInfo?
>>
>> On Tue, Jan 28, 2014 at 5:29 PM, Justin Uberti <juberti@google.com>
>> wrote:
>> > I don't think it needs to be an event. Just poll it at the frequency you
>> > care about.
>> >
>> >
>> > On Tue, Jan 28, 2014 at 5:26 PM, Peter Thatcher <pthatcher@google.com>
>> > wrote:
>> >>
>> >> On Tue, Jan 28, 2014 at 5:21 PM, Roman Shpount <
>> rshpount@turbobridge.com>
>> >> wrote:
>> >> > Would it make more sense to generalize a RtpContributingSource to
>> define
>> >> > a
>> >> > list of RTP header extensions and trigger an event every time the
>> value
>> >> > set
>> >> > changes:
>> >> >
>> >> > dictionary RtpHeaderExtension {
>> >> >   unsigned short id;
>> >> >   ArrayBuffer value;
>> >> > }
>> >> >
>> >> > dictionary RtpContributingSource {
>> >> >   unsigned int csrc;
>> >> >   sequence<RtpHeaderExtension> headerExtensions;
>> >> > }
>> >> >
>> >> > This way it is not limited to audio level only.
>> >> >
>> >>
>> >> Like Justin said, it's getting quite low-level at that point.  It's
>> >> not much different than my "give JS access to every packet" event.
>> >>
>> >> > This being said, the only problem I see with all of this is that
>> there
>> >> > are
>> >> > scenarios (like audio level) when this event will be triggered for
>> every
>> >> > packet. This will not scale for server side applications of orca.
>> >> >
>> >>
>> >> Since we only care about the latest values, can't we just throttle how
>> >> often the event is fired?  Say, every 200ms?
>> >>
>> >> > _____________
>> >> > Roman Shpount
>> >> >
>> >> >
>> >> > On Tue, Jan 28, 2014 at 7:56 PM, Peter Thatcher <
>> pthatcher@google.com>
>> >> > wrote:
>> >> >>
>> >> >> Yes, it's pretty low-level.  For this particular use case, what you
>> >> >> have is better, although I'm not sure I'd like calling it
>> "MixerInfo".
>> >> >>  How about just calling them "contributing source"s?
>> >> >>
>> >> >> dictionary RtpContributingSource {
>> >> >>   unsigned int csrc;
>> >> >>   int audioLevel;
>> >> >> }
>> >> >>
>> >> >> partial interface RtpReceiver {
>> >> >>   sequence<RtpContributingSource> getContributingSources();
>> >> >> }
>> >> >>
>> >> >>
>> >> >> Also, is it enough to require JS to poll?  Why not have an event for
>> >> >> when the values change?
>> >> >>
>> >> >> partial interface RtpReceiver {
>> >> >>    // Gets sequence<RtpContributingSource>
>> >> >>    attribute EventHandler? oncontributingsources;
>> >> >> }
>> >> >>
>> >> >>
>> >> >> Even so, would it still be worth it to have low-level header
>> extension
>> >> >> access?  It might be handy when an application wants a proprietary
>> >> >> header extension sent from their "mixer".  On the other hand, one
>> >> >> could probably just use the data channel, like I suggested earlier
>> :).
>> >> >>
>> >> >> By the way, the ease at which you put this on the RtpReceiver does
>> >> >> show what an advantage it is to have it.
>> >> >>
>> >> >>
>> >> >> On Tue, Jan 28, 2014 at 4:21 PM, Justin Uberti <juberti@google.com>
>> >> >> wrote:
>> >> >> > Having to mine through the raw packets feels like a pretty
>> low-level
>> >> >> > API
>> >> >> > to
>> >> >> > me.
>> >> >> >
>> >> >> > I was thinking that one could interrogate the RtpReceiver object
>> to
>> >> >> > get
>> >> >> > data
>> >> >> > on the most recently seen CSRCs and their corresponding energy
>> >> >> > levels.
>> >> >> > Something like
>> >> >> >
>> >> >> > dictionary RtpCsrcInfo {
>> >> >> >   unsigned int csrc;
>> >> >> >   int audioLevel;
>> >> >> > }
>> >> >> >
>> >> >> > dictionary RtpMixerInfo {
>> >> >> >   sequence<RtpCsrcInfo> csrcs;
>> >> >> > }
>> >> >> >
>> >> >> > partial interface RtpReceiver {
>> >> >> >   RtpMixerInfo getMixerInfo();
>> >> >> > }
>> >> >> >
>> >> >> > or maybe just return a dictionary with CSRC as keys and energy
>> levels
>> >> >> > as
>> >> >> > values.
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Jan 28, 2014 at 3:27 PM, Peter Thatcher
>> >> >> > <pthatcher@google.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> I think it would be reasonable to add some access to header
>> >> >> >> extensions
>> >> >> >> and CSRCs in the RtpReceiver object.
>> >> >> >>
>> >> >> >>
>> >> >> >> Would it make sense to have a general access to such things by
>> >> >> >> having
>> >> >> >> general access to receive packets?  It could be used like so:
>> >> >> >>
>> >> >> >> var receiver = new RtpReceiver(...);
>> >> >> >> receiver.onpackets = function(packets) {
>> >> >> >>   for (var i = 0; i < packets.length; i++) {
>> >> >> >>     var packet = packets[i];
>> >> >> >>     // Here you have access to
>> >> >> >>     // packet.csrcs
>> >> >> >>     // packet.headerExtensions
>> >> >> >>   }
>> >> >> >> }
>> >> >> >>
>> >> >> >> And defined like so:
>> >> >> >>
>> >> >> >> partial interface RtpReceiver {
>> >> >> >>   // Gives a sequence of RtpPacket
>> >> >> >>   // Fired in "batches" of packets.
>> >> >> >>   attribute EventHandler? onpackets;
>> >> >> >> }
>> >> >> >>
>> >> >> >> dictionary RtpPacket {
>> >> >> >>   sequence<unsigned int> csrcs;
>> >> >> >>   sequence<RtpHeaderExtension> headerExtensions;
>> >> >> >> }
>> >> >> >>
>> >> >> >> dictionary RtpHeaderExtension {
>> >> >> >>   unsigned short id;
>> >> >> >>   ArrayBuffer value;
>> >> >> >> }
>> >> >> >>
>> >> >> >>
>> >> >> >> That might leave a bit of work for you to build on top of, but it
>> >> >> >> would solve the "can I access header extension" issue once and
>> for
>> >> >> >> all.
>> >> >> >>
>> >> >> >> Would this meet your needs?
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> On Tue, Jan 28, 2014 at 2:51 PM, Emil Ivov <emcho@jitsi.org>
>> wrote:
>> >> >> >> > On Tue, Jan 28, 2014 at 11:41 PM, Peter Thatcher
>> >> >> >> > <pthatcher@google.com>
>> >> >> >> > wrote:
>> >> >> >> >> I guess it could continue in both.  The ORCA  CG might be
>> quicker
>> >> >> >> >> to
>> >> >> >> >> integrate something into the API than the WebRTC WG.
>> >> >> >> >>
>> >> >> >> >> My question is the same: exactly what info do you want
>> available
>> >> >> >> >> in
>> >> >> >> >> the JS?  The CSRCs?
>> >> >> >> >
>> >> >> >> > Same answer then: That would be CSRCs and/or audio level header
>> >> >> >> > extensions as per RFC6465.
>> >> >> >> >
>> >> >> >> > Emil
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > https://jitsi.org
>> >> >> >> >
>> >> >> >> >> On Tue, Jan 28, 2014 at 2:38 PM, Emil Ivov <emcho@jitsi.org>
>> >> >> >> >> wrote:
>> >> >> >> >>> I am not sure whether this discussion should only continue on
>> >> >> >> >>> one
>> >> >> >> >>> of
>> >> >> >> >>> the lists but until we figure that out I am going to answer
>> here
>> >> >> >> >>> as
>> >> >> >> >>> well
>> >> >> >> >>>
>> >> >> >> >>> Sync isn't really the issue here. It's mostly about the fact
>> >> >> >> >>> that
>> >> >> >> >>> the
>> >> >> >> >>> mixer is not a WebRTC entity. This means that it most likely
>> >> >> >> >>> doesn't
>> >> >> >> >>> even know what SCTP is, it doesn't necessarily have access to
>> >> >> >> >>> signalling and above all, the mix is likely to also contain
>> >> >> >> >>> audio
>> >> >> >> >>> from
>> >> >> >> >>> non-webrtc endpoints. Using DataChannels in such situations
>> >> >> >> >>> would
>> >> >> >> >>> likely turn out to be quite convoluted.
>> >> >> >> >>>
>> >> >> >> >>> Emil
>> >> >> >> >>>
>> >> >> >> >>> On Tue, Jan 28, 2014 at 10:18 PM, Peter Thatcher
>> >> >> >> >>> <pthatcher@google.com> wrote:
>> >> >> >> >>>> Over there, I suggested that you could simply send the audio
>> >> >> >> >>>> levels
>> >> >> >> >>>> over an unordered data channel.  If you're using one
>> >> >> >> >>>> IceTransport/DtlsTransport pair for both your RTP and SCTP,
>> it
>> >> >> >> >>>> would
>> >> >> >> >>>> probably stay very closely in sync.
>> >> >> >> >>>>
>> >> >> >> >>>> On Tue, Jan 28, 2014 at 5:44 AM, Emil Ivov <emcho@jitsi.org
>> >
>> >> >> >> >>>> wrote:
>> >> >> >> >>>>> Hey all,
>> >> >> >> >>>>>
>> >> >> >> >>>>> I just posted this to the WebRTC list here:
>> >> >> >> >>>>>
>> >> >> >> >>>>>
>> >> >> >> >>>>>
>> >> >> >> >>>>>
>> http://lists.w3.org/Archives/Public/public-webrtc/2014Jan/0256.html
>> >> >> >> >>>>>
>> >> >> >> >>>>> But I believe it's a question that is also very much worth
>> >> >> >> >>>>> resolving
>> >> >> >> >>>>> for ORTC, so I am also asking it here:
>> >> >> >> >>>>>
>> >> >> >> >>>>> One requirement that we often bump against is the
>> possibility
>> >> >> >> >>>>> to
>> >> >> >> >>>>> extract active speaker information from an incoming *mixed*
>> >> >> >> >>>>> audio
>> >> >> >> >>>>> stream. Acquiring the CSRC list from RTP would be a good
>> >> >> >> >>>>> start.
>> >> >> >> >>>>> Audio
>> >> >> >> >>>>> levels as per RFC6465 would be even better.
>> >> >> >> >>>>>
>> >> >> >> >>>>> Thoughts?
>> >> >> >> >>>>>
>> >> >> >> >>>>> Emil
>> >> >> >> >>>
>> >> >> >> >>> --
>> >> >> >> >>> https://jitsi.org
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > Emil Ivov, Ph.D.                       67000 Strasbourg,
>> >> >> >> > Project Lead                           France
>> >> >> >> > Jitsi
>> >> >> >> > emcho@jitsi.org                        PHONE:
>> +33.1.77.62.43.30
>> >> >> >> > https://jitsi.org                       FAX:
>> +33.1.77.62.47.31
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >
>> >
>> >
>>
>
>
Received on Wednesday, 29 January 2014 01:54:19 UTC