- From: Justin Uberti <juberti@google.com>
- Date: Fri, 31 Jan 2014 14:31:35 -0800
- To: Roman Shpount <rshpount@turbobridge.com>
- Cc: Peter Thatcher <pthatcher@google.com>, Martin Thomson <martin.thomson@gmail.com>, Emil Ivov <emcho@jitsi.org>, Bernard Aboba <Bernard.Aboba@microsoft.com>, "public-orca@w3.org" <public-orca@w3.org>
- Message-ID: <CAOJ7v-2+vZBnaiFHR14vVXKyofwsyB9v7FHOZ29_7o4qi-_BOw@mail.gmail.com>
The notion of 'Nodes' is specific to WebAudio, and the idea of adding a CSRC processor object to be vended from RTCRtpReceiver feels heavy given the fact this is a bit of an edge case. I am OK with either polling via receiver.getContributingSources, or an event such as receiver.oncontributingsourcesupdate, where the frequency is configurable but defaults to zero. On Fri, Jan 31, 2014 at 12:52 PM, Roman Shpount <rshpount@turbobridge.com>wrote: > I just copied the design from WebAudio. The benefit is ability to specify > multiple callbacks which are called at different frequencies and ability to > get rid of the call back at will. You should be able to code the same thing > with your API as well. > > _____________ > Roman Shpount > > > On Fri, Jan 31, 2014 at 3:19 PM, Peter Thatcher <pthatcher@google.com>wrote: > >> Looks more complicated. What's the benefit? The callback-based >> version of my proposal already allows specifying the frequency, and is >> more simple. >> >> >> On Thu, Jan 30, 2014 at 10:39 AM, Roman Shpount >> <rshpount@turbobridge.com> wrote: >> > How about something like this: >> > >> > ContributingSourceProcessorNode >> createContributingSourceProcessor(optional >> > unsigned long interval = 100, >> > optional unsigned long >> > maxContributingSources = 16); >> > >> > interface ContributingSourceProcessorNode { >> > attribute EventHandler onContributingSourceProcess; >> > }; >> > >> > dictionary ContributingSource { >> > readonly attribute double packetTime; >> > unsigned int csrc; >> > int audioLevel; >> > } >> > >> > interface ContributingSourceProcessingEvent : Event { >> > readonly attribute sequence<ContributingSource> contributingSources; >> > }; >> > >> > This way you can create a processor node and specify the frequency with >> > which it should be called. >> > >> > _____________ >> > Roman Shpount >> > >> > >> > On Thu, Jan 30, 2014 at 12:08 PM, Peter Thatcher <pthatcher@google.com> >> > wrote: >> >> >> >> Would it make sense to have an async getter that calls the callback >> >> function more than once? For example, to get the current value once, >> >> call like this: >> >> >> >> rtpReceiver.getContributorSources(function(contributorSources) { >> >> // Use the contributor sources just once. >> >> }); >> >> >> >> And to get called back every 100ms, call like this: >> >> >> >> rtpReceiver.getContributorSources(function(contributorSources) { >> >> // Use the contributor sources every 100ms. >> >> return true; >> >> }, 100); >> >> >> >> And to stop the callback: >> >> >> >> rtpReceiver.getContributorSources(function(contributorSources) { >> >> if (iAmAllDone) { >> >> // I'm all done. Don't call me anymore. >> >> return false; >> >> } >> >> return true; >> >> }, 100); >> >> >> >> >> >> That's somewhat halfway between an async getter and an event. Are >> >> there any existing HTML5 APIs like that? >> >> >> >> >> >> >> >> >> >> On Thu, Jan 30, 2014 at 8:21 AM, Martin Thomson >> >> <martin.thomson@gmail.com> wrote: >> >> > If it is an event, I think that the api should choose the rate. One >> >> > event >> >> > per packet makes little sense. I think that I would run at 5-10 >> updates >> >> > per >> >> > second, but that might depend on circumstances. >> >> > >> >> > On Jan 30, 2014 6:17 AM, "Emil Ivov" <emcho@jitsi.org> wrote: >> >> >> >> >> >> On Thu, Jan 30, 2014 at 2:10 AM, Justin Uberti <juberti@google.com> >> >> >> wrote: >> >> >> > As others have mentioned, the event rate here could be very high >> (50+ >> >> >> > PPS), >> >> >> > and I don't think that resolution is really needed for active >> speaker >> >> >> > identification. I have seen systems that work well even when >> sampling >> >> >> > this >> >> >> > information at ~ 5 Hz. >> >> >> > >> >> >> > As such I am still inclined to leave this as a polling interface >> and >> >> >> > allow >> >> >> > apps to control the resolution by their poll rate. >> >> >> >> >> >> Just to make sure I understand. What is the disadvantage of making >> >> >> this an event with an application controlled granularity? >> >> >> >> >> >> The two main advantages I see to keeping an event-based mechanism >> are: >> >> >> >> >> >> * streams where levels don't change that often (e.g. muted streams) >> >> >> would not cause any events, while polls would continue running. >> >> >> * it is unlikely that people would ever need to only do a single >> poll >> >> >> so there would always be need for periodicity. It would therefore be >> >> >> helpful if the API provided the infrastructure for the most common >> use >> >> >> case. >> >> >> >> >> >> Again, if the choice is between polling and not having access to >> these >> >> >> fields at all, then polling it is. >> >> >> >> >> >> Emil >> >> >> >> >> >> > >> >> >> > >> >> >> > On Wed, Jan 29, 2014 at 6:53 AM, Emil Ivov <emcho@jitsi.org> >> wrote: >> >> >> >> >> >> >> >> On Wed, Jan 29, 2014 at 3:14 PM, Bernard Aboba >> >> >> >> <Bernard.Aboba@microsoft.com> wrote: >> >> >> >> > Emil said: >> >> >> >> > >> >> >> >> > +1. While polling is obviously much better than nothing at all, >> >> >> >> > having a >> >> >> >> > change event would be quite convenient. >> >> >> >> > >> >> >> >> > With regard to energy levels, there are two main use cases: >> >> >> >> > >> >> >> >> > 1. acting on changes of the current speaker (e.g. in order to >> >> >> >> > upscale >> >> >> >> > their corresponding video and thumbnail everyone else) >> >> >> >> > 2. showing energy levels for all participants >> >> >> >> > >> >> >> >> > [BA] I believe that the polling proposal could address need #2 >> by >> >> >> >> > delivering a list of CSRCs as well as an (averaged) level, but >> I'm >> >> >> >> > not sure >> >> >> >> > about #1. >> >> >> >> >> >> >> >> Yup, agreed. >> >> >> >> >> >> >> >> > #1 is about timely dominant speaker identification, presumably >> >> >> >> > without >> >> >> >> > false speaker switches. >> >> >> >> > >> >> >> >> > To do this well, you may need to do more than firing an event >> >> >> >> > based >> >> >> >> > on >> >> >> >> > changes in a ranked list of speakers based on averaged levels; >> >> >> >> > better >> >> >> >> > approaches tend to actually process the audio. >> >> >> >> > >> >> >> >> > For example, see >> >> >> >> > >> >> >> >> > >> >> >> >> > >> http://webee.technion.ac.il/Sites/People/IsraelCohen/Publications/CSL_2012_Volfin.pdf >> >> >> >> >> >> >> >> Right. That's why per-packet hdr extensions carrying the CSRC >> levels >> >> >> >> would be the best (and only in the case of mixed streams) way to >> >> >> >> implement any of the above. So, if we could have events triggered >> >> >> >> for >> >> >> >> every new level, then we should be good. Unless I am missing >> >> >> >> something, this should be covered by Peter's suggested API. >> >> >> >> >> >> >> >> Emil >> >> >> >> >> >> >> >> -- >> >> >> >> https://jitsi.org >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Emil Ivov, Ph.D. 67000 Strasbourg, >> >> >> Project Lead France >> >> >> Jitsi >> >> >> emcho@jitsi.org PHONE: +33.1.77.62.43.30 >> >> >> https://jitsi.org FAX: +33.1.77.62.47.31 >> >> >> >> > >> > >> > >
Received on Friday, 31 January 2014 22:32:23 UTC