Re: active speaker information in mixed streams

First of all, the latest value of audio level is almost useless. You need
to apply some sort of averaging function to the audio level values you
received to get something that make sense (see section 5 of RFC 6464). For
instance, returning a max audio level for the specified interval, which
should be much longer then an individual packet duration makes much more
sense.

Second, since scenarios were received audio will not be decoded would be
very uncommon for orca clients, saving from exposing audio level from RTP
packets are not significant in comparison with calculating this value
directly from decoded audio.

As far ssrcs are concerned it would make sense to expose the latest list of
contributing sources with some sort of time stamp indicating the last time
each ssrc was seen. You can also expire and remove ssrcs from the list
after some period of time.

_____________
Roman Shpount


On Tue, Jan 28, 2014 at 8:32 PM, Peter Thatcher <pthatcher@google.com>wrote:

> Polling is fine with me.  What about calling it RtpContriburingSource?
>  Do you prefer that or MixerInfo?
>
> On Tue, Jan 28, 2014 at 5:29 PM, Justin Uberti <juberti@google.com> wrote:
> > I don't think it needs to be an event. Just poll it at the frequency you
> > care about.
> >
> >
> > On Tue, Jan 28, 2014 at 5:26 PM, Peter Thatcher <pthatcher@google.com>
> > wrote:
> >>
> >> On Tue, Jan 28, 2014 at 5:21 PM, Roman Shpount <
> rshpount@turbobridge.com>
> >> wrote:
> >> > Would it make more sense to generalize a RtpContributingSource to
> define
> >> > a
> >> > list of RTP header extensions and trigger an event every time the
> value
> >> > set
> >> > changes:
> >> >
> >> > dictionary RtpHeaderExtension {
> >> >   unsigned short id;
> >> >   ArrayBuffer value;
> >> > }
> >> >
> >> > dictionary RtpContributingSource {
> >> >   unsigned int csrc;
> >> >   sequence<RtpHeaderExtension> headerExtensions;
> >> > }
> >> >
> >> > This way it is not limited to audio level only.
> >> >
> >>
> >> Like Justin said, it's getting quite low-level at that point.  It's
> >> not much different than my "give JS access to every packet" event.
> >>
> >> > This being said, the only problem I see with all of this is that there
> >> > are
> >> > scenarios (like audio level) when this event will be triggered for
> every
> >> > packet. This will not scale for server side applications of orca.
> >> >
> >>
> >> Since we only care about the latest values, can't we just throttle how
> >> often the event is fired?  Say, every 200ms?
> >>
> >> > _____________
> >> > Roman Shpount
> >> >
> >> >
> >> > On Tue, Jan 28, 2014 at 7:56 PM, Peter Thatcher <pthatcher@google.com
> >
> >> > wrote:
> >> >>
> >> >> Yes, it's pretty low-level.  For this particular use case, what you
> >> >> have is better, although I'm not sure I'd like calling it
> "MixerInfo".
> >> >>  How about just calling them "contributing source"s?
> >> >>
> >> >> dictionary RtpContributingSource {
> >> >>   unsigned int csrc;
> >> >>   int audioLevel;
> >> >> }
> >> >>
> >> >> partial interface RtpReceiver {
> >> >>   sequence<RtpContributingSource> getContributingSources();
> >> >> }
> >> >>
> >> >>
> >> >> Also, is it enough to require JS to poll?  Why not have an event for
> >> >> when the values change?
> >> >>
> >> >> partial interface RtpReceiver {
> >> >>    // Gets sequence<RtpContributingSource>
> >> >>    attribute EventHandler? oncontributingsources;
> >> >> }
> >> >>
> >> >>
> >> >> Even so, would it still be worth it to have low-level header
> extension
> >> >> access?  It might be handy when an application wants a proprietary
> >> >> header extension sent from their "mixer".  On the other hand, one
> >> >> could probably just use the data channel, like I suggested earlier
> :).
> >> >>
> >> >> By the way, the ease at which you put this on the RtpReceiver does
> >> >> show what an advantage it is to have it.
> >> >>
> >> >>
> >> >> On Tue, Jan 28, 2014 at 4:21 PM, Justin Uberti <juberti@google.com>
> >> >> wrote:
> >> >> > Having to mine through the raw packets feels like a pretty
> low-level
> >> >> > API
> >> >> > to
> >> >> > me.
> >> >> >
> >> >> > I was thinking that one could interrogate the RtpReceiver object to
> >> >> > get
> >> >> > data
> >> >> > on the most recently seen CSRCs and their corresponding energy
> >> >> > levels.
> >> >> > Something like
> >> >> >
> >> >> > dictionary RtpCsrcInfo {
> >> >> >   unsigned int csrc;
> >> >> >   int audioLevel;
> >> >> > }
> >> >> >
> >> >> > dictionary RtpMixerInfo {
> >> >> >   sequence<RtpCsrcInfo> csrcs;
> >> >> > }
> >> >> >
> >> >> > partial interface RtpReceiver {
> >> >> >   RtpMixerInfo getMixerInfo();
> >> >> > }
> >> >> >
> >> >> > or maybe just return a dictionary with CSRC as keys and energy
> levels
> >> >> > as
> >> >> > values.
> >> >> >
> >> >> >
> >> >> > On Tue, Jan 28, 2014 at 3:27 PM, Peter Thatcher
> >> >> > <pthatcher@google.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> I think it would be reasonable to add some access to header
> >> >> >> extensions
> >> >> >> and CSRCs in the RtpReceiver object.
> >> >> >>
> >> >> >>
> >> >> >> Would it make sense to have a general access to such things by
> >> >> >> having
> >> >> >> general access to receive packets?  It could be used like so:
> >> >> >>
> >> >> >> var receiver = new RtpReceiver(...);
> >> >> >> receiver.onpackets = function(packets) {
> >> >> >>   for (var i = 0; i < packets.length; i++) {
> >> >> >>     var packet = packets[i];
> >> >> >>     // Here you have access to
> >> >> >>     // packet.csrcs
> >> >> >>     // packet.headerExtensions
> >> >> >>   }
> >> >> >> }
> >> >> >>
> >> >> >> And defined like so:
> >> >> >>
> >> >> >> partial interface RtpReceiver {
> >> >> >>   // Gives a sequence of RtpPacket
> >> >> >>   // Fired in "batches" of packets.
> >> >> >>   attribute EventHandler? onpackets;
> >> >> >> }
> >> >> >>
> >> >> >> dictionary RtpPacket {
> >> >> >>   sequence<unsigned int> csrcs;
> >> >> >>   sequence<RtpHeaderExtension> headerExtensions;
> >> >> >> }
> >> >> >>
> >> >> >> dictionary RtpHeaderExtension {
> >> >> >>   unsigned short id;
> >> >> >>   ArrayBuffer value;
> >> >> >> }
> >> >> >>
> >> >> >>
> >> >> >> That might leave a bit of work for you to build on top of, but it
> >> >> >> would solve the "can I access header extension" issue once and for
> >> >> >> all.
> >> >> >>
> >> >> >> Would this meet your needs?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Jan 28, 2014 at 2:51 PM, Emil Ivov <emcho@jitsi.org>
> wrote:
> >> >> >> > On Tue, Jan 28, 2014 at 11:41 PM, Peter Thatcher
> >> >> >> > <pthatcher@google.com>
> >> >> >> > wrote:
> >> >> >> >> I guess it could continue in both.  The ORCA  CG might be
> quicker
> >> >> >> >> to
> >> >> >> >> integrate something into the API than the WebRTC WG.
> >> >> >> >>
> >> >> >> >> My question is the same: exactly what info do you want
> available
> >> >> >> >> in
> >> >> >> >> the JS?  The CSRCs?
> >> >> >> >
> >> >> >> > Same answer then: That would be CSRCs and/or audio level header
> >> >> >> > extensions as per RFC6465.
> >> >> >> >
> >> >> >> > Emil
> >> >> >> >
> >> >> >> > --
> >> >> >> > https://jitsi.org
> >> >> >> >
> >> >> >> >> On Tue, Jan 28, 2014 at 2:38 PM, Emil Ivov <emcho@jitsi.org>
> >> >> >> >> wrote:
> >> >> >> >>> I am not sure whether this discussion should only continue on
> >> >> >> >>> one
> >> >> >> >>> of
> >> >> >> >>> the lists but until we figure that out I am going to answer
> here
> >> >> >> >>> as
> >> >> >> >>> well
> >> >> >> >>>
> >> >> >> >>> Sync isn't really the issue here. It's mostly about the fact
> >> >> >> >>> that
> >> >> >> >>> the
> >> >> >> >>> mixer is not a WebRTC entity. This means that it most likely
> >> >> >> >>> doesn't
> >> >> >> >>> even know what SCTP is, it doesn't necessarily have access to
> >> >> >> >>> signalling and above all, the mix is likely to also contain
> >> >> >> >>> audio
> >> >> >> >>> from
> >> >> >> >>> non-webrtc endpoints. Using DataChannels in such situations
> >> >> >> >>> would
> >> >> >> >>> likely turn out to be quite convoluted.
> >> >> >> >>>
> >> >> >> >>> Emil
> >> >> >> >>>
> >> >> >> >>> On Tue, Jan 28, 2014 at 10:18 PM, Peter Thatcher
> >> >> >> >>> <pthatcher@google.com> wrote:
> >> >> >> >>>> Over there, I suggested that you could simply send the audio
> >> >> >> >>>> levels
> >> >> >> >>>> over an unordered data channel.  If you're using one
> >> >> >> >>>> IceTransport/DtlsTransport pair for both your RTP and SCTP,
> it
> >> >> >> >>>> would
> >> >> >> >>>> probably stay very closely in sync.
> >> >> >> >>>>
> >> >> >> >>>> On Tue, Jan 28, 2014 at 5:44 AM, Emil Ivov <emcho@jitsi.org>
> >> >> >> >>>> wrote:
> >> >> >> >>>>> Hey all,
> >> >> >> >>>>>
> >> >> >> >>>>> I just posted this to the WebRTC list here:
> >> >> >> >>>>>
> >> >> >> >>>>>
> >> >> >> >>>>>
> >> >> >> >>>>>
> http://lists.w3.org/Archives/Public/public-webrtc/2014Jan/0256.html
> >> >> >> >>>>>
> >> >> >> >>>>> But I believe it's a question that is also very much worth
> >> >> >> >>>>> resolving
> >> >> >> >>>>> for ORTC, so I am also asking it here:
> >> >> >> >>>>>
> >> >> >> >>>>> One requirement that we often bump against is the
> possibility
> >> >> >> >>>>> to
> >> >> >> >>>>> extract active speaker information from an incoming *mixed*
> >> >> >> >>>>> audio
> >> >> >> >>>>> stream. Acquiring the CSRC list from RTP would be a good
> >> >> >> >>>>> start.
> >> >> >> >>>>> Audio
> >> >> >> >>>>> levels as per RFC6465 would be even better.
> >> >> >> >>>>>
> >> >> >> >>>>> Thoughts?
> >> >> >> >>>>>
> >> >> >> >>>>> Emil
> >> >> >> >>>
> >> >> >> >>> --
> >> >> >> >>> https://jitsi.org
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > Emil Ivov, Ph.D.                       67000 Strasbourg,
> >> >> >> > Project Lead                           France
> >> >> >> > Jitsi
> >> >> >> > emcho@jitsi.org                        PHONE: +33.1.77.62.43.30
> >> >> >> > https://jitsi.org                       FAX:
> +33.1.77.62.47.31
> >> >> >>
> >> >> >
> >> >>
> >> >
> >
> >
>

Received on Wednesday, 29 January 2014 01:51:31 UTC