- From: Martin Thomson <martin.thomson@gmail.com>
- Date: Thu, 30 Jan 2014 08:21:25 -0800
- To: Emil Ivov <emcho@jitsi.org>
- Cc: Peter Thatcher <pthatcher@google.com>, Bernard Aboba <Bernard.Aboba@microsoft.com>, "public-orca@w3.org" <public-orca@w3.org>, Roman Shpount <rshpount@turbobridge.com>, Justin Uberti <juberti@google.com>
- Message-ID: <CABkgnnUAF-x7G=L9m2_H0uCAQBx0bvDRJtuNuOd7SBixps+1yA@mail.gmail.com>
If it is an event, I think that the api should choose the rate. One event per packet makes little sense. I think that I would run at 5-10 updates per second, but that might depend on circumstances. On Jan 30, 2014 6:17 AM, "Emil Ivov" <emcho@jitsi.org> wrote: > On Thu, Jan 30, 2014 at 2:10 AM, Justin Uberti <juberti@google.com> wrote: > > As others have mentioned, the event rate here could be very high (50+ > PPS), > > and I don't think that resolution is really needed for active speaker > > identification. I have seen systems that work well even when sampling > this > > information at ~ 5 Hz. > > > > As such I am still inclined to leave this as a polling interface and > allow > > apps to control the resolution by their poll rate. > > Just to make sure I understand. What is the disadvantage of making > this an event with an application controlled granularity? > > The two main advantages I see to keeping an event-based mechanism are: > > * streams where levels don't change that often (e.g. muted streams) > would not cause any events, while polls would continue running. > * it is unlikely that people would ever need to only do a single poll > so there would always be need for periodicity. It would therefore be > helpful if the API provided the infrastructure for the most common use > case. > > Again, if the choice is between polling and not having access to these > fields at all, then polling it is. > > Emil > > > > > > > On Wed, Jan 29, 2014 at 6:53 AM, Emil Ivov <emcho@jitsi.org> wrote: > >> > >> On Wed, Jan 29, 2014 at 3:14 PM, Bernard Aboba > >> <Bernard.Aboba@microsoft.com> wrote: > >> > Emil said: > >> > > >> > +1. While polling is obviously much better than nothing at all, > having a > >> > change event would be quite convenient. > >> > > >> > With regard to energy levels, there are two main use cases: > >> > > >> > 1. acting on changes of the current speaker (e.g. in order to upscale > >> > their corresponding video and thumbnail everyone else) > >> > 2. showing energy levels for all participants > >> > > >> > [BA] I believe that the polling proposal could address need #2 by > >> > delivering a list of CSRCs as well as an (averaged) level, but I'm > not sure > >> > about #1. > >> > >> Yup, agreed. > >> > >> > #1 is about timely dominant speaker identification, presumably without > >> > false speaker switches. > >> > > >> > To do this well, you may need to do more than firing an event based on > >> > changes in a ranked list of speakers based on averaged levels; better > >> > approaches tend to actually process the audio. > >> > > >> > For example, see > >> > > http://webee.technion.ac.il/Sites/People/IsraelCohen/Publications/CSL_2012_Volfin.pdf > >> > >> Right. That's why per-packet hdr extensions carrying the CSRC levels > >> would be the best (and only in the case of mixed streams) way to > >> implement any of the above. So, if we could have events triggered for > >> every new level, then we should be good. Unless I am missing > >> something, this should be covered by Peter's suggested API. > >> > >> Emil > >> > >> -- > >> https://jitsi.org > > > > > > > > -- > Emil Ivov, Ph.D. 67000 Strasbourg, > Project Lead France > Jitsi > emcho@jitsi.org PHONE: +33.1.77.62.43.30 > https://jitsi.org FAX: +33.1.77.62.47.31 > >
Received on Thursday, 30 January 2014 16:21:54 UTC