Re: active speaker information in mixed streams

As others have mentioned, the event rate here could be very high (50+ PPS),
and I don't think that resolution is really needed for active speaker
identification. I have seen systems that work well even when sampling this
information at ~ 5 Hz.

As such I am still inclined to leave this as a polling interface and allow
apps to control the resolution by their poll rate.


On Wed, Jan 29, 2014 at 6:53 AM, Emil Ivov <emcho@jitsi.org> wrote:

> On Wed, Jan 29, 2014 at 3:14 PM, Bernard Aboba
> <Bernard.Aboba@microsoft.com> wrote:
> > Emil said:
> >
> > +1. While polling is obviously much better than nothing at all, having a
> change event would be quite convenient.
> >
> > With regard to energy levels, there are two main use cases:
> >
> > 1.  acting on changes of the current speaker (e.g. in order to upscale
> their corresponding video and thumbnail everyone else)
> > 2.  showing energy levels for all participants
> >
> > [BA] I believe that the polling proposal could address need #2 by
> delivering a list of CSRCs as well as an (averaged) level, but I'm not sure
> about #1.
>
> Yup, agreed.
>
> > #1 is about timely dominant speaker identification, presumably without
> false speaker switches.
> >
> > To do this well, you may need to do more than firing an event based on
> changes in a ranked list of speakers based on averaged levels; better
> approaches tend to actually process the audio.
> >
> > For example, see
> http://webee.technion.ac.il/Sites/People/IsraelCohen/Publications/CSL_2012_Volfin.pdf
>
> Right. That's why per-packet hdr extensions carrying the CSRC levels
> would be the best (and only in the case of mixed streams) way to
> implement any of the above. So, if we could have events triggered for
> every new level, then we should be good. Unless I am missing
> something, this should be covered by Peter's suggested API.
>
> Emil
>
> --
> https://jitsi.org
>

Received on Thursday, 30 January 2014 01:11:11 UTC