Re: active speaker information in mixed streams from Roman Shpount on 2014-01-30 (public-orca@w3.org from January 2014)

From: Roman Shpount <rshpount@turbobridge.com>
Date: Thu, 30 Jan 2014 13:39:10 -0500
To: Peter Thatcher <pthatcher@google.com>
Cc: Martin Thomson <martin.thomson@gmail.com>, Emil Ivov <emcho@jitsi.org>, Bernard Aboba <Bernard.Aboba@microsoft.com>, "public-orca@w3.org" <public-orca@w3.org>, Justin Uberti <juberti@google.com>
Message-ID: <CAD5OKxt7TAcTJroQYq5NgGUcsOU25tECmgfcSdKAAHo-TEPvwg@mail.gmail.com>
How about something like this:

ContributingSourceProcessorNode createContributingSourceProcessor(optional
unsigned long interval = 100,
                                      optional unsigned long
maxContributingSources = 16);

interface ContributingSourceProcessorNode {
    attribute EventHandler onContributingSourceProcess;
};

dictionary ContributingSource {
  readonly attribute double packetTime;
  unsigned int csrc;
  int audioLevel;
}

interface ContributingSourceProcessingEvent : Event {
    readonly attribute sequence<ContributingSource> contributingSources;
};

This way you can create a processor node and specify the frequency with
which it should be called.

_____________
Roman Shpount


On Thu, Jan 30, 2014 at 12:08 PM, Peter Thatcher <pthatcher@google.com>wrote:

> Would it make sense to have an async getter that calls the callback
> function more than once?  For example, to get the current value once,
> call like this:
>
> rtpReceiver.getContributorSources(function(contributorSources) {
>   // Use the contributor sources just once.
> });
>
> And to get called back every 100ms, call like this:
>
> rtpReceiver.getContributorSources(function(contributorSources) {
>   // Use the contributor sources every 100ms.
>   return true;
> }, 100);
>
> And to stop the callback:
>
> rtpReceiver.getContributorSources(function(contributorSources) {
>   if (iAmAllDone) {
>     // I'm all done.  Don't call me anymore.
>     return false;
>   }
>   return true;
> }, 100);
>
>
> That's somewhat halfway between an async getter and an event.  Are
> there any existing HTML5 APIs like that?
>
>
>
>
> On Thu, Jan 30, 2014 at 8:21 AM, Martin Thomson
> <martin.thomson@gmail.com> wrote:
> > If it is an event, I think that the api should choose the rate. One event
> > per packet makes little sense. I think that I would run at 5-10 updates
> per
> > second, but that might depend on circumstances.
> >
> > On Jan 30, 2014 6:17 AM, "Emil Ivov" <emcho@jitsi.org> wrote:
> >>
> >> On Thu, Jan 30, 2014 at 2:10 AM, Justin Uberti <juberti@google.com>
> wrote:
> >> > As others have mentioned, the event rate here could be very high (50+
> >> > PPS),
> >> > and I don't think that resolution is really needed for active speaker
> >> > identification. I have seen systems that work well even when sampling
> >> > this
> >> > information at ~ 5 Hz.
> >> >
> >> > As such I am still inclined to leave this as a polling interface and
> >> > allow
> >> > apps to control the resolution by their poll rate.
> >>
> >> Just to make sure I understand. What is the disadvantage of making
> >> this an event with an application controlled granularity?
> >>
> >> The two main advantages I see to keeping an event-based mechanism are:
> >>
> >> * streams where levels don't change that often (e.g. muted streams)
> >> would not cause any events, while polls would continue running.
> >> * it is unlikely that people would ever need to only do a single poll
> >> so there would always be need for periodicity. It would therefore be
> >> helpful if the API provided the infrastructure for the most common use
> >> case.
> >>
> >> Again, if the choice is between polling and not having access to these
> >> fields at all, then polling it is.
> >>
> >> Emil
> >>
> >> >
> >> >
> >> > On Wed, Jan 29, 2014 at 6:53 AM, Emil Ivov <emcho@jitsi.org> wrote:
> >> >>
> >> >> On Wed, Jan 29, 2014 at 3:14 PM, Bernard Aboba
> >> >> <Bernard.Aboba@microsoft.com> wrote:
> >> >> > Emil said:
> >> >> >
> >> >> > +1. While polling is obviously much better than nothing at all,
> >> >> > having a
> >> >> > change event would be quite convenient.
> >> >> >
> >> >> > With regard to energy levels, there are two main use cases:
> >> >> >
> >> >> > 1.  acting on changes of the current speaker (e.g. in order to
> >> >> > upscale
> >> >> > their corresponding video and thumbnail everyone else)
> >> >> > 2.  showing energy levels for all participants
> >> >> >
> >> >> > [BA] I believe that the polling proposal could address need #2 by
> >> >> > delivering a list of CSRCs as well as an (averaged) level, but I'm
> >> >> > not sure
> >> >> > about #1.
> >> >>
> >> >> Yup, agreed.
> >> >>
> >> >> > #1 is about timely dominant speaker identification, presumably
> >> >> > without
> >> >> > false speaker switches.
> >> >> >
> >> >> > To do this well, you may need to do more than firing an event based
> >> >> > on
> >> >> > changes in a ranked list of speakers based on averaged levels;
> better
> >> >> > approaches tend to actually process the audio.
> >> >> >
> >> >> > For example, see
> >> >> >
> >> >> >
> http://webee.technion.ac.il/Sites/People/IsraelCohen/Publications/CSL_2012_Volfin.pdf
> >> >>
> >> >> Right. That's why per-packet hdr extensions carrying the CSRC levels
> >> >> would be the best (and only in the case of mixed streams) way to
> >> >> implement any of the above. So, if we could have events triggered for
> >> >> every new level, then we should be good. Unless I am missing
> >> >> something, this should be covered by Peter's suggested API.
> >> >>
> >> >> Emil
> >> >>
> >> >> --
> >> >> https://jitsi.org
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Emil Ivov, Ph.D.                       67000 Strasbourg,
> >> Project Lead                           France
> >> Jitsi
> >> emcho@jitsi.org                        PHONE: +33.1.77.62.43.30
> >> https://jitsi.org                       FAX:   +33.1.77.62.47.31
> >>
>
Received on Thursday, 30 January 2014 18:39:41 UTC