- From: Randell Jesup <randell-ietf@jesup.org>
- Date: Tue, 17 Apr 2012 20:23:13 -0400
- To: public-audio@w3.org
- Message-ID: <4F8E0971.6080809@jesup.org>
> I'm not sure I understand the question. The > MediaElementAudioSourceNode > is used to gain access to an <audio> or <video> element for > streaming > file content and not to gain access to a MediaStream. For WebRTC > purposes I believe something like createMediaStreamSource() > and createMediaStreamDestination() (or their track-based > versions) will > be necessary. > > > * perhaps createMediaStreamSource / Destination should work on > track > level instead (as you seem to indicate as well); a MediaStream is > really just a collection of tracks, and those can be audio or video > tracks. If you work on track level you can do processing that > results in an audio track and combine that with a video track > into a > MediaStream > > > Yes, I think that based on previous discussions we've had that we'll > need more track-based versions of createMediaStreamSource / > Destination. > Although perhaps we could have both. For a simple use, > if createMediaStreamSource() were used, then it would grab the first > audio track from the stream and use that by default. How does that > sound? Because often a MediaStream would contain only a single > audio track? > > > That sounds reasonable. I think in many cases there will only be a > single audio track. > So it sounds like to modify audio in a MediaStream you'll need to: * Extract each track from a MediaStream * Turn each track into a source (might be combined with previous step) * Attach each source to a graph * Extract tracks from the destination of the graphs * Extract the video stream(s) from the MediaStream source * Combine all the tracks back into a new MediaStream This is a lot of decomposition and recomposition, and a bunch of code to add in almost every instance where we're doing anything more complex than volume to a MediaStream. On a separate note, while not directly applicable to Audio, I'll toss my personal opinion in that we want a unified framework to process media in (audio or video). We've already seen lots of people modifying the video from WebRTC and from getUserMedia() (from silly antlers to instagram-like effects, etc), and we know they'll want to do more (face tracking, visual ID, QR code recognizers, etc), and running everything through a <canvas> is not a great solution (laggy, low performance, stalls main-thread, etc). My thought is that a) we should have an easier way to process data sourced from or going to a MediaStream b) we need a framework we can cleanly apply to processing video c) Main-thread JS is of very limited utility in practice because of GC/CC/UI/pageloads/etc, but the ability to process audio in JS gives us a huge escape valve for functionalities that aren't built-in. I noted in the archives Chris indicated that adding support for JS Workers was in the works (Feb 1): Jussi Kalliokoski has asked about adding web workers to the JavaScriptAudioNode on this list a little while back. We also discussed this at the W3C face-to-face meeting very recently and agreed that this should be added to the JavaScriptAudioNode spec. It will amount to a very small API change, so I'll try to update the specification document soon. I want to make clear that simply moving JavaScript to a worker thread doesn't solve every problem. Garbage collection stalls are still very much an issue, and these are quite irksome to deal with in a real-time system, where we would like to achieve low latency without glitches or stutters. Has there been any progress on this? I should note that an audio (or video) processing worker would typically throw no garbage (and so avoid GC), and even if there is garbage, there would be almost no live roots and GC/CC would be very fast. Audio processing in JS on the main thread is virtually a non-starter due to lag/jerk/etc. Chris also wrote in that message: > Chris, in the Audio Web API, you have some kind of predefined effects and > also a way to define custom processings in Javascript (this could also be > done at low level with C implementations, and may be a way to load this 'C > audio plugin' in browser ?). It would be great to be able to load custom C/C++ plugins (like VST or AudioUnits), where a single AudioNode corresponds to a loaded code module. But there are very serious security implications with this idea, so unfortunately it's not so simple (using either my or Robert's approach). In either it might be possible to load an emscripten-compiled C/C++ filter; the performance likely would be no better than a well-hand-coded native JS filter (circa 1/3 raw C/C++ speed, YMMV) - but there are plenty of existing C filters available. Also, emscripten doesn't produce garbage when running, which is good. In my mind, many of the differences between the specs are resolvable. Rob has said his design doesn't preclude predefining native processing filters, and it sounds like Chris is open to JS Workers. I believe we need something that integrates better with MediaStreams, and gives a framework for video processing (which would speak to something closer to MediaStream Processing for a source/destination API), and I think we need to easily be able to leverage some pre-defined processing nodes (from Chris' spec). With a design like this, typical uses wouldn't need any sample-by-sample JS processing, but whenever that is needed it can run smoothly. -- Randell Jesup randell-ietf@jesup.org
Received on Wednesday, 18 April 2012 00:24:10 UTC