Re: active speaker information in mixed streams from Roman Shpount on 2014-01-31 (public-orca@w3.org from January 2014)

From: Roman Shpount <rshpount@turbobridge.com>
Date: Fri, 31 Jan 2014 15:52:33 -0500
To: Peter Thatcher <pthatcher@google.com>
Cc: Martin Thomson <martin.thomson@gmail.com>, Emil Ivov <emcho@jitsi.org>, Bernard Aboba <Bernard.Aboba@microsoft.com>, "public-orca@w3.org" <public-orca@w3.org>, Justin Uberti <juberti@google.com>
Message-ID: <CAD5OKxsy3+5E09YWPtNQDC8Ejss-G4YfC-_e6caD+HFDhpszaw@mail.gmail.com>
I just copied the design from WebAudio. The benefit is ability to specify
multiple callbacks which are called at different frequencies and ability to
get rid of the call back at will. You should be able to code the same thing
with your API as well.

_____________
Roman Shpount


On Fri, Jan 31, 2014 at 3:19 PM, Peter Thatcher <pthatcher@google.com>wrote:

> Looks more complicated.  What's the benefit?  The callback-based
> version of my proposal already allows specifying the frequency, and is
> more simple.
>
>
> On Thu, Jan 30, 2014 at 10:39 AM, Roman Shpount
> <rshpount@turbobridge.com> wrote:
> > How about something like this:
> >
> > ContributingSourceProcessorNode
> createContributingSourceProcessor(optional
> > unsigned long interval = 100,
> >                                       optional unsigned long
> > maxContributingSources = 16);
> >
> > interface ContributingSourceProcessorNode {
> >     attribute EventHandler onContributingSourceProcess;
> > };
> >
> > dictionary ContributingSource {
> >   readonly attribute double packetTime;
> >   unsigned int csrc;
> >   int audioLevel;
> > }
> >
> > interface ContributingSourceProcessingEvent : Event {
> >     readonly attribute sequence<ContributingSource> contributingSources;
> > };
> >
> > This way you can create a processor node and specify the frequency with
> > which it should be called.
> >
> > _____________
> > Roman Shpount
> >
> >
> > On Thu, Jan 30, 2014 at 12:08 PM, Peter Thatcher <pthatcher@google.com>
> > wrote:
> >>
> >> Would it make sense to have an async getter that calls the callback
> >> function more than once?  For example, to get the current value once,
> >> call like this:
> >>
> >> rtpReceiver.getContributorSources(function(contributorSources) {
> >>   // Use the contributor sources just once.
> >> });
> >>
> >> And to get called back every 100ms, call like this:
> >>
> >> rtpReceiver.getContributorSources(function(contributorSources) {
> >>   // Use the contributor sources every 100ms.
> >>   return true;
> >> }, 100);
> >>
> >> And to stop the callback:
> >>
> >> rtpReceiver.getContributorSources(function(contributorSources) {
> >>   if (iAmAllDone) {
> >>     // I'm all done.  Don't call me anymore.
> >>     return false;
> >>   }
> >>   return true;
> >> }, 100);
> >>
> >>
> >> That's somewhat halfway between an async getter and an event.  Are
> >> there any existing HTML5 APIs like that?
> >>
> >>
> >>
> >>
> >> On Thu, Jan 30, 2014 at 8:21 AM, Martin Thomson
> >> <martin.thomson@gmail.com> wrote:
> >> > If it is an event, I think that the api should choose the rate. One
> >> > event
> >> > per packet makes little sense. I think that I would run at 5-10
> updates
> >> > per
> >> > second, but that might depend on circumstances.
> >> >
> >> > On Jan 30, 2014 6:17 AM, "Emil Ivov" <emcho@jitsi.org> wrote:
> >> >>
> >> >> On Thu, Jan 30, 2014 at 2:10 AM, Justin Uberti <juberti@google.com>
> >> >> wrote:
> >> >> > As others have mentioned, the event rate here could be very high
> (50+
> >> >> > PPS),
> >> >> > and I don't think that resolution is really needed for active
> speaker
> >> >> > identification. I have seen systems that work well even when
> sampling
> >> >> > this
> >> >> > information at ~ 5 Hz.
> >> >> >
> >> >> > As such I am still inclined to leave this as a polling interface
> and
> >> >> > allow
> >> >> > apps to control the resolution by their poll rate.
> >> >>
> >> >> Just to make sure I understand. What is the disadvantage of making
> >> >> this an event with an application controlled granularity?
> >> >>
> >> >> The two main advantages I see to keeping an event-based mechanism
> are:
> >> >>
> >> >> * streams where levels don't change that often (e.g. muted streams)
> >> >> would not cause any events, while polls would continue running.
> >> >> * it is unlikely that people would ever need to only do a single poll
> >> >> so there would always be need for periodicity. It would therefore be
> >> >> helpful if the API provided the infrastructure for the most common
> use
> >> >> case.
> >> >>
> >> >> Again, if the choice is between polling and not having access to
> these
> >> >> fields at all, then polling it is.
> >> >>
> >> >> Emil
> >> >>
> >> >> >
> >> >> >
> >> >> > On Wed, Jan 29, 2014 at 6:53 AM, Emil Ivov <emcho@jitsi.org>
> wrote:
> >> >> >>
> >> >> >> On Wed, Jan 29, 2014 at 3:14 PM, Bernard Aboba
> >> >> >> <Bernard.Aboba@microsoft.com> wrote:
> >> >> >> > Emil said:
> >> >> >> >
> >> >> >> > +1. While polling is obviously much better than nothing at all,
> >> >> >> > having a
> >> >> >> > change event would be quite convenient.
> >> >> >> >
> >> >> >> > With regard to energy levels, there are two main use cases:
> >> >> >> >
> >> >> >> > 1.  acting on changes of the current speaker (e.g. in order to
> >> >> >> > upscale
> >> >> >> > their corresponding video and thumbnail everyone else)
> >> >> >> > 2.  showing energy levels for all participants
> >> >> >> >
> >> >> >> > [BA] I believe that the polling proposal could address need #2
> by
> >> >> >> > delivering a list of CSRCs as well as an (averaged) level, but
> I'm
> >> >> >> > not sure
> >> >> >> > about #1.
> >> >> >>
> >> >> >> Yup, agreed.
> >> >> >>
> >> >> >> > #1 is about timely dominant speaker identification, presumably
> >> >> >> > without
> >> >> >> > false speaker switches.
> >> >> >> >
> >> >> >> > To do this well, you may need to do more than firing an event
> >> >> >> > based
> >> >> >> > on
> >> >> >> > changes in a ranked list of speakers based on averaged levels;
> >> >> >> > better
> >> >> >> > approaches tend to actually process the audio.
> >> >> >> >
> >> >> >> > For example, see
> >> >> >> >
> >> >> >> >
> >> >> >> >
> http://webee.technion.ac.il/Sites/People/IsraelCohen/Publications/CSL_2012_Volfin.pdf
> >> >> >>
> >> >> >> Right. That's why per-packet hdr extensions carrying the CSRC
> levels
> >> >> >> would be the best (and only in the case of mixed streams) way to
> >> >> >> implement any of the above. So, if we could have events triggered
> >> >> >> for
> >> >> >> every new level, then we should be good. Unless I am missing
> >> >> >> something, this should be covered by Peter's suggested API.
> >> >> >>
> >> >> >> Emil
> >> >> >>
> >> >> >> --
> >> >> >> https://jitsi.org
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Emil Ivov, Ph.D.                       67000 Strasbourg,
> >> >> Project Lead                           France
> >> >> Jitsi
> >> >> emcho@jitsi.org                        PHONE: +33.1.77.62.43.30
> >> >> https://jitsi.org                       FAX:   +33.1.77.62.47.31
> >> >>
> >
> >
>
Received on Friday, 31 January 2014 20:53:04 UTC