Re: active speaker information in mixed streams

Looks more complicated.  What's the benefit?  The callback-based
version of my proposal already allows specifying the frequency, and is
more simple.


On Thu, Jan 30, 2014 at 10:39 AM, Roman Shpount
<rshpount@turbobridge.com> wrote:
> How about something like this:
>
> ContributingSourceProcessorNode createContributingSourceProcessor(optional
> unsigned long interval = 100,
>                                       optional unsigned long
> maxContributingSources = 16);
>
> interface ContributingSourceProcessorNode {
>     attribute EventHandler onContributingSourceProcess;
> };
>
> dictionary ContributingSource {
>   readonly attribute double packetTime;
>   unsigned int csrc;
>   int audioLevel;
> }
>
> interface ContributingSourceProcessingEvent : Event {
>     readonly attribute sequence<ContributingSource> contributingSources;
> };
>
> This way you can create a processor node and specify the frequency with
> which it should be called.
>
> _____________
> Roman Shpount
>
>
> On Thu, Jan 30, 2014 at 12:08 PM, Peter Thatcher <pthatcher@google.com>
> wrote:
>>
>> Would it make sense to have an async getter that calls the callback
>> function more than once?  For example, to get the current value once,
>> call like this:
>>
>> rtpReceiver.getContributorSources(function(contributorSources) {
>>   // Use the contributor sources just once.
>> });
>>
>> And to get called back every 100ms, call like this:
>>
>> rtpReceiver.getContributorSources(function(contributorSources) {
>>   // Use the contributor sources every 100ms.
>>   return true;
>> }, 100);
>>
>> And to stop the callback:
>>
>> rtpReceiver.getContributorSources(function(contributorSources) {
>>   if (iAmAllDone) {
>>     // I'm all done.  Don't call me anymore.
>>     return false;
>>   }
>>   return true;
>> }, 100);
>>
>>
>> That's somewhat halfway between an async getter and an event.  Are
>> there any existing HTML5 APIs like that?
>>
>>
>>
>>
>> On Thu, Jan 30, 2014 at 8:21 AM, Martin Thomson
>> <martin.thomson@gmail.com> wrote:
>> > If it is an event, I think that the api should choose the rate. One
>> > event
>> > per packet makes little sense. I think that I would run at 5-10 updates
>> > per
>> > second, but that might depend on circumstances.
>> >
>> > On Jan 30, 2014 6:17 AM, "Emil Ivov" <emcho@jitsi.org> wrote:
>> >>
>> >> On Thu, Jan 30, 2014 at 2:10 AM, Justin Uberti <juberti@google.com>
>> >> wrote:
>> >> > As others have mentioned, the event rate here could be very high (50+
>> >> > PPS),
>> >> > and I don't think that resolution is really needed for active speaker
>> >> > identification. I have seen systems that work well even when sampling
>> >> > this
>> >> > information at ~ 5 Hz.
>> >> >
>> >> > As such I am still inclined to leave this as a polling interface and
>> >> > allow
>> >> > apps to control the resolution by their poll rate.
>> >>
>> >> Just to make sure I understand. What is the disadvantage of making
>> >> this an event with an application controlled granularity?
>> >>
>> >> The two main advantages I see to keeping an event-based mechanism are:
>> >>
>> >> * streams where levels don't change that often (e.g. muted streams)
>> >> would not cause any events, while polls would continue running.
>> >> * it is unlikely that people would ever need to only do a single poll
>> >> so there would always be need for periodicity. It would therefore be
>> >> helpful if the API provided the infrastructure for the most common use
>> >> case.
>> >>
>> >> Again, if the choice is between polling and not having access to these
>> >> fields at all, then polling it is.
>> >>
>> >> Emil
>> >>
>> >> >
>> >> >
>> >> > On Wed, Jan 29, 2014 at 6:53 AM, Emil Ivov <emcho@jitsi.org> wrote:
>> >> >>
>> >> >> On Wed, Jan 29, 2014 at 3:14 PM, Bernard Aboba
>> >> >> <Bernard.Aboba@microsoft.com> wrote:
>> >> >> > Emil said:
>> >> >> >
>> >> >> > +1. While polling is obviously much better than nothing at all,
>> >> >> > having a
>> >> >> > change event would be quite convenient.
>> >> >> >
>> >> >> > With regard to energy levels, there are two main use cases:
>> >> >> >
>> >> >> > 1.  acting on changes of the current speaker (e.g. in order to
>> >> >> > upscale
>> >> >> > their corresponding video and thumbnail everyone else)
>> >> >> > 2.  showing energy levels for all participants
>> >> >> >
>> >> >> > [BA] I believe that the polling proposal could address need #2 by
>> >> >> > delivering a list of CSRCs as well as an (averaged) level, but I'm
>> >> >> > not sure
>> >> >> > about #1.
>> >> >>
>> >> >> Yup, agreed.
>> >> >>
>> >> >> > #1 is about timely dominant speaker identification, presumably
>> >> >> > without
>> >> >> > false speaker switches.
>> >> >> >
>> >> >> > To do this well, you may need to do more than firing an event
>> >> >> > based
>> >> >> > on
>> >> >> > changes in a ranked list of speakers based on averaged levels;
>> >> >> > better
>> >> >> > approaches tend to actually process the audio.
>> >> >> >
>> >> >> > For example, see
>> >> >> >
>> >> >> >
>> >> >> > http://webee.technion.ac.il/Sites/People/IsraelCohen/Publications/CSL_2012_Volfin.pdf
>> >> >>
>> >> >> Right. That's why per-packet hdr extensions carrying the CSRC levels
>> >> >> would be the best (and only in the case of mixed streams) way to
>> >> >> implement any of the above. So, if we could have events triggered
>> >> >> for
>> >> >> every new level, then we should be good. Unless I am missing
>> >> >> something, this should be covered by Peter's suggested API.
>> >> >>
>> >> >> Emil
>> >> >>
>> >> >> --
>> >> >> https://jitsi.org
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Emil Ivov, Ph.D.                       67000 Strasbourg,
>> >> Project Lead                           France
>> >> Jitsi
>> >> emcho@jitsi.org                        PHONE: +33.1.77.62.43.30
>> >> https://jitsi.org                       FAX:   +33.1.77.62.47.31
>> >>
>
>

Received on Friday, 31 January 2014 20:20:08 UTC