Re: MediaElementAudioSource / MediaStreamAudioSource for offlineAudioContext from Alan deLespinasse on 2014-07-22 (public-audio@w3.org from July to September 2014)

From: Alan deLespinasse <adelespinasse@gmail.com>
Date: Mon, 21 Jul 2014 20:47:44 -0400
To: public-audio@w3.org
Message-ID: <CAMuhF-JYme1rtRR-7Cfvd7DbBvQcb7O7NXrVw025mhxdWF7mKg@mail.gmail.com>

I just found this thread. I'd been planning to start a thread asking about
how MediaElementAudioSourceNode should work with OfflineAudioContext, but
I'll continue this thread instead.

I actually think that it does make sense to use a
MediaElementAudioSourceNode in an OfflineAudioContext. I would very much
like to be able to do this, particularly with a <video> element which isn't
currently part of the visible DOM structure. In this case, I would want the
video to play at whatever rate it needs to to keep the audio buffers full,
i.e. hopefully faster than real time. If the video can't be decoded fast
enough, then the OfflineAudioContext should wait for it.

Use cases:

1. A simple case: a web application that lets the user select a video file
on their local storage, and uploads just the audio track to a server. It
would create a Blob URL for the file, create a video element using that
URL, and plug that into the OfflineAudioContext via a
MediaElementAudioSourceNode. The audio could be retrieved as one giant
buffer when the OfflineAudioContext is done, or as the decoding happens via
a ScriptProcessorNode, or maybe sent to the server via WebRTC with a
MediaStreamAudioDestinationNode. (Note: for this use case, it should not be
necessary to actually decode the video track of the file, but I have no
idea how hard it would be to implement this as a browser optimization.)

2. Similar to the above, but instead of the video track being completely
dropped, its resolution and bit rate would be decreased. WebGL or a 2D
canvas context can be used for high-quality scaling of each frame. This
would be very useful for something like a web-based video editor, where you
would like a low-quality version of your source material to be available
very quickly for editing; the original high-quality footage can be uploaded
later before final high-quality rendering.

3. More general cases, such as a web-based video editor that does its
rendering on the client side (which is actually something I've been working
on). In this case, the video source material would come from the server
(possibly uploaded as described above). During editing, playback is
low-quality and real-time, but when it's time to render the final,
high-quality product, it generally has to happen much slower than real
time. Of course it might be possible to do the audio and video rendering
entirely separately in some cases, but there are cases where the rendered
video would depend on the rendered audio and vice versa, such as an
oscilloscope effect (audio affects video) or a text-to-speech effect (video
affects audio).

I know I'm probably raising some complicated issues here, having to do with
synchronization and worker threads and things I haven't thought of. I don't
know a lot about browser implementations. I have read some recent
conversations (mainly on this list) having to do with making all
ScriptProcessorNode callbacks happen in a worker thread (both for
AudioContext and OfflineAudioContext, I believe), which seems like a good
idea, but could make some things difficult since the media element feeding
a MediaElementAudioSourceNode can't exist in a worker.

Is any of the above practical? Is there any hope of these scenarios being
possible to implement, let alone in an efficient and elegant way? I feel
like it should be possible.

Also relevant:

https://github.com/WebAudio/web-audio-api/issues/308

https://code.google.com/p/chromium/issues/detail?id=387558

Received on Tuesday, 22 July 2014 09:35:37 UTC