- From: Randell Jesup <randell-ietf@jesup.org>
- Date: Tue, 17 Apr 2012 23:50:42 -0400
- To: public-audio@w3.org
- Message-ID: <4F8E3A12.1030704@jesup.org>
On 4/17/2012 9:51 PM, Chris Rogers wrote: > > On Tue, Apr 17, 2012 at 5:23 PM, Randell Jesup <randell-ietf@jesup.org > <mailto:randell-ietf@jesup.org>> wrote: > > > So it sounds like to modify audio in a MediaStream you'll need to: > > * Extract each track from a MediaStream > * Turn each track into a source (might be combined with previous step) > * Attach each source to a graph > * Extract tracks from the destination of the graphs > * Extract the video stream(s) from the MediaStream source > * Combine all the tracks back into a new MediaStream > > This is a lot of decomposition and recomposition, and a bunch of > code to add in almost every instance where we're doing anything > more complex than volume to a MediaStream. > > > It sounds like a few lines of JavaScript even for this case of > multiple audio tracks per stream. And I would expect it to be > desirable to split out the separate tracks anyway for individual > processing. I doubt you'd apply different filters to the different tracks very often. > But is it usually the case that a MediaStream will contain multiple > audio tracks? It was my understanding that in many cases there would > be a single audio track per stream. Surely this will be the usual > case with local input from getUserMedia(). And in a tele-conference > scenario wouldn't there be multiple MediaStreams coming from different > peers, each usually having a single audio track? Perhaps I > misunderstand the most common cases here. No, that's the common case - but you'll need to write the code for the general case or it will break whenever there is a stream with multiple tracks, which would then make multiple tracks an effectively unusable feature. > > On a separate note, while not directly applicable to Audio, I'll > toss my personal opinion in that we want a unified framework to > process media in (audio or video). We've already seen lots of > people modifying the video from WebRTC and from getUserMedia() > (from silly antlers to instagram-like effects, etc), and we know > they'll want to do more (face tracking, visual ID, QR code > recognizers, etc), and running everything through a <canvas> is > not a great solution (laggy, low performance, stalls main-thread, > etc). > > > I'm sure we can make performance improvements to our graphics/video > presentation APIs and implementation, but this need not be shoehorned > together into our audio processing architecture which has its own > unique set of stringent real-time constraints for games and > interactive applications. Well designed and well-factored APIs can be > used together in powerful ways without creating monolithic > architecture which can overly generalize concepts unique to specific > media types. Sure, but I have to say this seems like a very powerful and logically consistent approach. And honestly we need an API for processing video Real Soon Now, and I see no other pathway to getting one. > > Although we're still in the very early days of demos for WebRTC, > here's a really interesting one illustrating how these APIs can be > combined in a very interesting way: > http://www.soundstep.com/blog/2012/03/22/javascript-motion-detection/?utm_source=rss&utm_medium=rss&utm_campaign=javascript-motion-detection > <http://www.soundstep.com/blog/2012/03/22/javascript-motion-detection/?utm_source=rss&utm_medium=rss&utm_campaign=javascript-motion-detection> This is very reminiscent of Amiga Live!, a program demoed at he launch of the Amiga in 1985 (with Andy Warhol and Debbie Harry, IIRC) that leveraged a genlock/digitizer to let you interact with elements on the screen (bells, xylophone, drums, etc). > I should note that an audio (or video) processing worker would > typically throw no garbage (and so avoid GC), and even if there is > garbage, there would be almost no live roots and GC/CC would be > very fast. > > > I'm sure this would vary greatly depending on the particular JS code > running in the worker and the particular JS engine implementation. In general, yes, but if the code throws no garbage, there should be no GC. > Audio processing in JS on the main thread is virtually a > non-starter due to lag/jerk/etc. > > > You're right that the problems on the main thread are worse, but > nevertheless some people have expressed the desire to be able to > process on the main thread. It's much simpler to deal with in terms > of sharing JS context/variables, and running JS code in a worker > brings in its own set of complications. I think both could be useful, > depending on the application. > > > Chris also wrote in that message: > > > Chris, in the Audio Web API, you have some kind of predefined effects and > > also a way to define custom processings in Javascript (this could also be > > done at low level with C implementations, and may be a way to load this 'C > > audio plugin' in browser ?). > > It would be great to be able to load custom C/C++ plugins (like VST or > AudioUnits), where a single AudioNode corresponds to a loaded code module. > But there are very serious security implications with this idea, so > unfortunately it's not so simple (using either my or Robert's approach). > > In either it might be possible to load an emscripten-compiled > C/C++ filter; the performance likely would be no better than a > well-hand-coded native JS filter (circa 1/3 raw C/C++ speed, YMMV) > - but there are plenty of existing C filters available. Also, > emscripten doesn't produce garbage when running, which is good. > > > People can certainly try that approach, and we should do nothing to > stop them, but it can hardly be called user-friendly. I think you > might be underestimating the complexity of defining a "plugin" format > similar to VST or AudioUnits and wrapping it up in emscripten-compiled > C/C++. Debugging could also prove to be a nightmare. It certainly > should not be the starting point for how we expect people to process > and synthesize sophisticated audio effects on the web. I wasn't saying I advocated anyone doing this; but it's a way to do it, and do it without the same sort of security concerns. -- Randell Jesup randell-ietf@jesup.org
Received on Wednesday, 18 April 2012 03:51:39 UTC