- From: Randell Jesup <randell-ietf@jesup.org>
- Date: Tue, 17 Apr 2012 20:23:13 -0400
- To: public-audio@w3.org
- Message-ID: <4F8E0971.6080809@jesup.org>
> I'm not sure I understand the question. The
> MediaElementAudioSourceNode
> is used to gain access to an <audio> or <video> element for
> streaming
> file content and not to gain access to a MediaStream. For WebRTC
> purposes I believe something like createMediaStreamSource()
> and createMediaStreamDestination() (or their track-based
> versions) will
> be necessary.
>
>
> * perhaps createMediaStreamSource / Destination should work on
> track
> level instead (as you seem to indicate as well); a MediaStream is
> really just a collection of tracks, and those can be audio or video
> tracks. If you work on track level you can do processing that
> results in an audio track and combine that with a video track
> into a
> MediaStream
>
>
> Yes, I think that based on previous discussions we've had that we'll
> need more track-based versions of createMediaStreamSource /
> Destination.
> Although perhaps we could have both. For a simple use,
> if createMediaStreamSource() were used, then it would grab the first
> audio track from the stream and use that by default. How does that
> sound? Because often a MediaStream would contain only a single
> audio track?
>
>
> That sounds reasonable. I think in many cases there will only be a
> single audio track.
>
So it sounds like to modify audio in a MediaStream you'll need to:
* Extract each track from a MediaStream
* Turn each track into a source (might be combined with previous step)
* Attach each source to a graph
* Extract tracks from the destination of the graphs
* Extract the video stream(s) from the MediaStream source
* Combine all the tracks back into a new MediaStream
This is a lot of decomposition and recomposition, and a bunch of code to
add in almost every instance where we're doing anything more complex
than volume to a MediaStream.
On a separate note, while not directly applicable to Audio, I'll toss my
personal opinion in that we want a unified framework to process media in
(audio or video). We've already seen lots of people modifying the video
from WebRTC and from getUserMedia() (from silly antlers to
instagram-like effects, etc), and we know they'll want to do more (face
tracking, visual ID, QR code recognizers, etc), and running everything
through a <canvas> is not a great solution (laggy, low performance,
stalls main-thread, etc).
My thought is that
a) we should have an easier way to process data sourced from or going to
a MediaStream
b) we need a framework we can cleanly apply to processing video
c) Main-thread JS is of very limited utility in practice because of
GC/CC/UI/pageloads/etc, but the ability to process audio in JS gives us
a huge escape valve for functionalities that aren't built-in.
I noted in the archives Chris indicated that adding support for JS
Workers was in the works (Feb 1):
Jussi Kalliokoski has asked about adding web workers to the
JavaScriptAudioNode on this list a little while back. We also discussed
this at the W3C face-to-face meeting very recently and agreed that this
should be added to the JavaScriptAudioNode spec. It will amount to a very
small API change, so I'll try to update the specification document soon. I
want to make clear that simply moving JavaScript to a worker thread doesn't
solve every problem. Garbage collection stalls are still very much an
issue, and these are quite irksome to deal with in a real-time system,
where we would like to achieve low latency without glitches or stutters.
Has there been any progress on this? I should note that an audio (or
video) processing worker would typically throw no garbage (and so avoid
GC), and even if there is garbage, there would be almost no live roots
and GC/CC would be very fast. Audio processing in JS on the main thread
is virtually a non-starter due to lag/jerk/etc.
Chris also wrote in that message:
> Chris, in the Audio Web API, you have some kind of predefined effects and
> also a way to define custom processings in Javascript (this could also be
> done at low level with C implementations, and may be a way to load this 'C
> audio plugin' in browser ?).
It would be great to be able to load custom C/C++ plugins (like VST or
AudioUnits), where a single AudioNode corresponds to a loaded code module.
But there are very serious security implications with this idea, so
unfortunately it's not so simple (using either my or Robert's approach).
In either it might be possible to load an emscripten-compiled C/C++
filter; the performance likely would be no better than a well-hand-coded
native JS filter (circa 1/3 raw C/C++ speed, YMMV) - but there are
plenty of existing C filters available. Also, emscripten doesn't
produce garbage when running, which is good.
In my mind, many of the differences between the specs are resolvable.
Rob has said his design doesn't preclude predefining native processing
filters, and it sounds like Chris is open to JS Workers. I believe we
need something that integrates better with MediaStreams, and gives a
framework for video processing (which would speak to something closer to
MediaStream Processing for a source/destination API), and I think we
need to easily be able to leverage some pre-defined processing nodes
(from Chris' spec). With a design like this, typical uses wouldn't need
any sample-by-sample JS processing, but whenever that is needed it can
run smoothly.
--
Randell Jesup
randell-ietf@jesup.org
Received on Wednesday, 18 April 2012 00:24:10 UTC