Re: active speaker information in mixed streams from Emil Ivov on 2014-01-30 (public-orca@w3.org from January 2014)

From: Emil Ivov <emcho@jitsi.org>
Date: Thu, 30 Jan 2014 15:16:56 +0100
To: Justin Uberti <juberti@google.com>
Cc: Bernard Aboba <Bernard.Aboba@microsoft.com>, Roman Shpount <rshpount@turbobridge.com>, Peter Thatcher <pthatcher@google.com>, "public-orca@w3.org" <public-orca@w3.org>
Message-ID: <CAPvvaaJssdTWQmhshMytj_kjh1NFccQQ9u4QQgdw6oUm=CquCA@mail.gmail.com>

On Thu, Jan 30, 2014 at 2:10 AM, Justin Uberti <juberti@google.com> wrote:
> As others have mentioned, the event rate here could be very high (50+ PPS),
> and I don't think that resolution is really needed for active speaker
> identification. I have seen systems that work well even when sampling this
> information at ~ 5 Hz.
>
> As such I am still inclined to leave this as a polling interface and allow
> apps to control the resolution by their poll rate.

Just to make sure I understand. What is the disadvantage of making
this an event with an application controlled granularity?

The two main advantages I see to keeping an event-based mechanism are:

* streams where levels don't change that often (e.g. muted streams)
would not cause any events, while polls would continue running.
* it is unlikely that people would ever need to only do a single poll
so there would always be need for periodicity. It would therefore be
helpful if the API provided the infrastructure for the most common use
case.

Again, if the choice is between polling and not having access to these
fields at all, then polling it is.

Emil

>
>
> On Wed, Jan 29, 2014 at 6:53 AM, Emil Ivov <emcho@jitsi.org> wrote:
>>
>> On Wed, Jan 29, 2014 at 3:14 PM, Bernard Aboba
>> <Bernard.Aboba@microsoft.com> wrote:
>> > Emil said:
>> >
>> > +1. While polling is obviously much better than nothing at all, having a
>> > change event would be quite convenient.
>> >
>> > With regard to energy levels, there are two main use cases:
>> >
>> > 1.  acting on changes of the current speaker (e.g. in order to upscale
>> > their corresponding video and thumbnail everyone else)
>> > 2.  showing energy levels for all participants
>> >
>> > [BA] I believe that the polling proposal could address need #2 by
>> > delivering a list of CSRCs as well as an (averaged) level, but I'm not sure
>> > about #1.
>>
>> Yup, agreed.
>>
>> > #1 is about timely dominant speaker identification, presumably without
>> > false speaker switches.
>> >
>> > To do this well, you may need to do more than firing an event based on
>> > changes in a ranked list of speakers based on averaged levels; better
>> > approaches tend to actually process the audio.
>> >
>> > For example, see
>> > http://webee.technion.ac.il/Sites/People/IsraelCohen/Publications/CSL_2012_Volfin.pdf
>>
>> Right. That's why per-packet hdr extensions carrying the CSRC levels
>> would be the best (and only in the case of mixed streams) way to
>> implement any of the above. So, if we could have events triggered for
>> every new level, then we should be good. Unless I am missing
>> something, this should be covered by Peter's suggested API.
>>
>> Emil
>>
>> --
>> https://jitsi.org
>
>



-- 
Emil Ivov, Ph.D.                       67000 Strasbourg,
Project Lead                           France
Jitsi
emcho@jitsi.org                        PHONE: +33.1.77.62.43.30
https://jitsi.org                       FAX:   +33.1.77.62.47.31

Received on Thursday, 30 January 2014 14:17:45 UTC