Re: Proposal for Audio Track Worklet API: github.com/alvestrand/audio-worklet from youenn fablet on 2018-10-11 (public-webrtc@w3.org from October 2018)

From: youenn fablet <yfablet@apple.com>
Date: Thu, 11 Oct 2018 13:27:11 -0700
To: Harald Alvestrand <harald@alvestrand.no>
Cc: public-webrtc@w3.org
Message-id: <A3151FE8-5EF2-46E5-A8BF-046F33F94F5D@apple.com>
> On Oct 10, 2018, at 9:00 PM, Harald Alvestrand <harald@alvestrand.no> wrote:
> 
> Youenn, thanks for the comments!
> 
> On 10/11/2018 01:12 AM, youenn fablet wrote:
>> Hi Harald,
>> 
>> This is an interesting proposal, some feedback/questions below.
>> 
>> The proposal has some overlap with existing technologies and it is worth understanding how they relate.
>> 
>> For instance, ScriptProcessorNode provides ways to access to the samples, though on the main thread.
> 
> ScriptProcessor is marked as deprecated in the current WebAudio spec -
> it seems to have caused performance issues. I took that as evidence
> enough not to pursue that shape of API further.

It is deprecated in favor of AudioWorkletNode, hence why we need to understand why AudioWorkletNode is not a solution here.

> 
>> As stated in the draft, WebAudio Worklet could be emulated through AudioWorkletNode.
>> The question is then why to not use AudioContext/AudioWorkletNode directly.
>> If not possible, would a WebRTCAudioContext be able to solve these issues?
> 
> The people who have tried to use AudioWorklet for signals processing
> have reported issues with performance.
> 
> While everything can be optimized, I think that some of the reasons for
> bad performance is architectural decisions - particularly the choice of
> using a synchronized clock (which requires resampling before processing
> if you have out-of-sync audiotracks) and the choice of a single (float32
> linear) representation of samples (which requires a conversion step).

I wonder how that cost compares to the cost of exposing such data to JavaScript in the first place.
Put it differently, if the cost of exposing audio data to JavaScript is costing 10 and the conversion/resampling is costing 1, why bothering to optimize the latter.
Agreed that WASM might be able to help there at some point in the future.

> 
> I'd like to explore further what we can achieve with an API that is
> closer in spirit to the C++ APIs of Google's WebRTC implementation,
> while still acting like a sensible part of the Web platform.

Sounds good.
Getting precise use cases that are in scope of this API would help a lot.

> 
>> 
>> In terms of API, the model seems to mandate a one input/one optional-output model.
>> I guess some cases (sub/super-sampling, mixing, fanning out) cannot probably be handled.
> Subsampling and supersampling should be doable, since these are
> one-track operations.
> 
> Mixing and fanout can't be handled this way, but mixing requires
> synchronization, which requires resampling - see previous point.
> This is an important tradeoff point - we should make the choice
> carefully here.
> If the code in the worklet is willing to do resampling and
> synchronization itself, shared array buffers offers a relatively
> performant way of shuffling samples between tracks, so mixing may be
> possible, albeit with a somewhat more convoluted programming model than
> the WebAudio one.
> 
>> I guess the idea is to use WebAudio for those cases instead.
>> The question is then what are the cases AudioMediaTrackProcessor should be used for and what are the cases WebAudio should be used instead.
> 
> I think it depends on what overhead is tolerable for the application.
> 
>> 
>> Thanks,
>> 	Y
>> 
>>> On Oct 10, 2018, at 1:50 AM, Harald Alvestrand <harald@alvestrand.no> wrote:
>>> 
>>> As part of my homework from the June WG meeting, and in preparation for
>>> TPAC, I have started drawing up a proposal for a worklet that allows us
>>> to process audio.
>>> 
>>> Link to the presentation form: https://alvestrand.github.io/audio-worklet/
>>> 
>>> I haven't made this a generic processor for audio and video, because I
>>> think efficient processing of video (especially large-frame video) will
>>> require significantly more attention to buffering and utiliziation of
>>> platform-embedded processors (GPUs!) than is required for usable audio
>>> processing.
>>> 
>>> Note: This proposal (or even this general idea) is a PROPOSAL TO the WG
>>> - it does not represent any form of decision.
>>> 
>>> Comments welcome!
>>> 
>>> Harald
>>> 
>>> -- 
>>> Surveillance is pervasive. Go Dark.
>>> 
>>> 
>>> 
>> 
> 
> -- 
> Surveillance is pervasive. Go Dark.
Received on Thursday, 11 October 2018 20:27:46 UTC