Re: active speaker information in mixed streams from Martin Thomson on 2014-01-30 (public-orca@w3.org from January 2014)

From: Martin Thomson <martin.thomson@gmail.com>
Date: Thu, 30 Jan 2014 08:21:25 -0800
To: Emil Ivov <emcho@jitsi.org>
Cc: Peter Thatcher <pthatcher@google.com>, Bernard Aboba <Bernard.Aboba@microsoft.com>, "public-orca@w3.org" <public-orca@w3.org>, Roman Shpount <rshpount@turbobridge.com>, Justin Uberti <juberti@google.com>
Message-ID: <CABkgnnUAF-x7G=L9m2_H0uCAQBx0bvDRJtuNuOd7SBixps+1yA@mail.gmail.com>

If it is an event, I think that the api should choose the rate. One event
per packet makes little sense. I think that I would run at 5-10 updates per
second, but that might depend on circumstances.
On Jan 30, 2014 6:17 AM, "Emil Ivov" <emcho@jitsi.org> wrote:

> On Thu, Jan 30, 2014 at 2:10 AM, Justin Uberti <juberti@google.com> wrote:
> > As others have mentioned, the event rate here could be very high (50+
> PPS),
> > and I don't think that resolution is really needed for active speaker
> > identification. I have seen systems that work well even when sampling
> this
> > information at ~ 5 Hz.
> >
> > As such I am still inclined to leave this as a polling interface and
> allow
> > apps to control the resolution by their poll rate.
>
> Just to make sure I understand. What is the disadvantage of making
> this an event with an application controlled granularity?
>
> The two main advantages I see to keeping an event-based mechanism are:
>
> * streams where levels don't change that often (e.g. muted streams)
> would not cause any events, while polls would continue running.
> * it is unlikely that people would ever need to only do a single poll
> so there would always be need for periodicity. It would therefore be
> helpful if the API provided the infrastructure for the most common use
> case.
>
> Again, if the choice is between polling and not having access to these
> fields at all, then polling it is.
>
> Emil
>
> >
> >
> > On Wed, Jan 29, 2014 at 6:53 AM, Emil Ivov <emcho@jitsi.org> wrote:
> >>
> >> On Wed, Jan 29, 2014 at 3:14 PM, Bernard Aboba
> >> <Bernard.Aboba@microsoft.com> wrote:
> >> > Emil said:
> >> >
> >> > +1. While polling is obviously much better than nothing at all,
> having a
> >> > change event would be quite convenient.
> >> >
> >> > With regard to energy levels, there are two main use cases:
> >> >
> >> > 1.  acting on changes of the current speaker (e.g. in order to upscale
> >> > their corresponding video and thumbnail everyone else)
> >> > 2.  showing energy levels for all participants
> >> >
> >> > [BA] I believe that the polling proposal could address need #2 by
> >> > delivering a list of CSRCs as well as an (averaged) level, but I'm
> not sure
> >> > about #1.
> >>
> >> Yup, agreed.
> >>
> >> > #1 is about timely dominant speaker identification, presumably without
> >> > false speaker switches.
> >> >
> >> > To do this well, you may need to do more than firing an event based on
> >> > changes in a ranked list of speakers based on averaged levels; better
> >> > approaches tend to actually process the audio.
> >> >
> >> > For example, see
> >> >
> http://webee.technion.ac.il/Sites/People/IsraelCohen/Publications/CSL_2012_Volfin.pdf
> >>
> >> Right. That's why per-packet hdr extensions carrying the CSRC levels
> >> would be the best (and only in the case of mixed streams) way to
> >> implement any of the above. So, if we could have events triggered for
> >> every new level, then we should be good. Unless I am missing
> >> something, this should be covered by Peter's suggested API.
> >>
> >> Emil
> >>
> >> --
> >> https://jitsi.org
> >
> >
>
>
>
> --
> Emil Ivov, Ph.D.                       67000 Strasbourg,
> Project Lead                           France
> Jitsi
> emcho@jitsi.org                        PHONE: +33.1.77.62.43.30
> https://jitsi.org                       FAX:   +33.1.77.62.47.31
>
>

Received on Thursday, 30 January 2014 16:21:54 UTC