Re: active speaker information in mixed streams from Peter Thatcher on 2014-01-30 (public-orca@w3.org from January 2014)

From: Peter Thatcher <pthatcher@google.com>
Date: Thu, 30 Jan 2014 09:08:05 -0800
To: Martin Thomson <martin.thomson@gmail.com>
Cc: Emil Ivov <emcho@jitsi.org>, Bernard Aboba <Bernard.Aboba@microsoft.com>, "public-orca@w3.org" <public-orca@w3.org>, Roman Shpount <rshpount@turbobridge.com>, Justin Uberti <juberti@google.com>
Message-ID: <CAJrXDUHACkqfaCd14GTonDn114oMa5YuMfuc_+7PuYqPx4KSKA@mail.gmail.com>

Would it make sense to have an async getter that calls the callback
function more than once?  For example, to get the current value once,
call like this:

rtpReceiver.getContributorSources(function(contributorSources) {
  // Use the contributor sources just once.
});

And to get called back every 100ms, call like this:

rtpReceiver.getContributorSources(function(contributorSources) {
  // Use the contributor sources every 100ms.
  return true;
}, 100);

And to stop the callback:

rtpReceiver.getContributorSources(function(contributorSources) {
  if (iAmAllDone) {
    // I'm all done.  Don't call me anymore.
    return false;
  }
  return true;
}, 100);


That's somewhat halfway between an async getter and an event.  Are
there any existing HTML5 APIs like that?




On Thu, Jan 30, 2014 at 8:21 AM, Martin Thomson
<martin.thomson@gmail.com> wrote:
> If it is an event, I think that the api should choose the rate. One event
> per packet makes little sense. I think that I would run at 5-10 updates per
> second, but that might depend on circumstances.
>
> On Jan 30, 2014 6:17 AM, "Emil Ivov" <emcho@jitsi.org> wrote:
>>
>> On Thu, Jan 30, 2014 at 2:10 AM, Justin Uberti <juberti@google.com> wrote:
>> > As others have mentioned, the event rate here could be very high (50+
>> > PPS),
>> > and I don't think that resolution is really needed for active speaker
>> > identification. I have seen systems that work well even when sampling
>> > this
>> > information at ~ 5 Hz.
>> >
>> > As such I am still inclined to leave this as a polling interface and
>> > allow
>> > apps to control the resolution by their poll rate.
>>
>> Just to make sure I understand. What is the disadvantage of making
>> this an event with an application controlled granularity?
>>
>> The two main advantages I see to keeping an event-based mechanism are:
>>
>> * streams where levels don't change that often (e.g. muted streams)
>> would not cause any events, while polls would continue running.
>> * it is unlikely that people would ever need to only do a single poll
>> so there would always be need for periodicity. It would therefore be
>> helpful if the API provided the infrastructure for the most common use
>> case.
>>
>> Again, if the choice is between polling and not having access to these
>> fields at all, then polling it is.
>>
>> Emil
>>
>> >
>> >
>> > On Wed, Jan 29, 2014 at 6:53 AM, Emil Ivov <emcho@jitsi.org> wrote:
>> >>
>> >> On Wed, Jan 29, 2014 at 3:14 PM, Bernard Aboba
>> >> <Bernard.Aboba@microsoft.com> wrote:
>> >> > Emil said:
>> >> >
>> >> > +1. While polling is obviously much better than nothing at all,
>> >> > having a
>> >> > change event would be quite convenient.
>> >> >
>> >> > With regard to energy levels, there are two main use cases:
>> >> >
>> >> > 1.  acting on changes of the current speaker (e.g. in order to
>> >> > upscale
>> >> > their corresponding video and thumbnail everyone else)
>> >> > 2.  showing energy levels for all participants
>> >> >
>> >> > [BA] I believe that the polling proposal could address need #2 by
>> >> > delivering a list of CSRCs as well as an (averaged) level, but I'm
>> >> > not sure
>> >> > about #1.
>> >>
>> >> Yup, agreed.
>> >>
>> >> > #1 is about timely dominant speaker identification, presumably
>> >> > without
>> >> > false speaker switches.
>> >> >
>> >> > To do this well, you may need to do more than firing an event based
>> >> > on
>> >> > changes in a ranked list of speakers based on averaged levels; better
>> >> > approaches tend to actually process the audio.
>> >> >
>> >> > For example, see
>> >> >
>> >> > http://webee.technion.ac.il/Sites/People/IsraelCohen/Publications/CSL_2012_Volfin.pdf
>> >>
>> >> Right. That's why per-packet hdr extensions carrying the CSRC levels
>> >> would be the best (and only in the case of mixed streams) way to
>> >> implement any of the above. So, if we could have events triggered for
>> >> every new level, then we should be good. Unless I am missing
>> >> something, this should be covered by Peter's suggested API.
>> >>
>> >> Emil
>> >>
>> >> --
>> >> https://jitsi.org
>> >
>> >
>>
>>
>>
>> --
>> Emil Ivov, Ph.D.                       67000 Strasbourg,
>> Project Lead                           France
>> Jitsi
>> emcho@jitsi.org                        PHONE: +33.1.77.62.43.30
>> https://jitsi.org                       FAX:   +33.1.77.62.47.31
>>

Received on Thursday, 30 January 2014 17:09:14 UTC