Re: Aiding early implementations of the web audio API from Marcus Geelnard on 2012-05-23 (public-audio@w3.org from April to June 2012)

From: Marcus Geelnard <mage@opera.com>
Date: Wed, 23 May 2012 10:00:18 +0200
To: "Jussi Kalliokoski" <jussi.kalliokoski@gmail.com>, "Chris Wilson" <cwilso@google.com>
Cc: public-audio@w3.org, "Alistair MacDonald" <al@signedon.com>
Message-ID: <op.weq4isjcm77heq@mage-desktop>
Den 2012-05-22 19:55:54 skrev Chris Wilson <cwilso@google.com>:

> On Tue, May 22, 2012 at 7:18 AM, Jussi Kalliokoski <
> jussi.kalliokoski@gmail.com> wrote:
>
>> On Tue, May 22, 2012 at 12:13 PM, Marcus Geelnard <mage@opera.com>  
>> wrote:
>>
>>> Yes, that's true. The filter and convolver nodes are certainly useful  
>>> for
>>> game sound effects (underwater, cave, etc). Other nodes can also be  
>>> used
>>> for creating interesting effects, but that's not the point.
>>>
>>> My view on the issue is that the more complex the spec is, the more it
>>> will cost (spec work, implementation time, test suits, bug fixing,
>>> source/binary sizes, etc), and the more likely it is that we will have
>>> different behavior in different implementations, ranging from  
>>> noticeable
>>> performance differences to noticeable differences in sound and  
>>> behavior and
>>> possibly implementation-dependent corner case bugs etc.
>>>
>>> Since all nodes can be implemented in JavaScript (most of them are even
>>> trivial), the only reason for using native nodes instead of JavaScript
>>> nodes is to improve performance.
>>>
>>
>> I have similar thoughts on this. I'm starting to think we're going to  
>> have
>> serious spec bloat if we go to lengths defining all the audio building
>> blocks that are required to achieve every use case, often resulting  
>> only in
>> awkward and hacky (no offends, these aren't easy things) solutions just  
>> to
>> avoid having to use a script node that would mitigate a lot of the  
>> benefits
>> in having a native graph. I feel that it would be a good idea to strip  
>> down
>> the spec a bit, making the script nodes more first-class citizens of the
>> graph, while reducing the need for effects that are simple to implement
>> with custom scripts. I think this would make the standardization effort  
>> a lot simple as well.
>>
>
> I have to disagree with the definition of "trivial," then.  The only node
> types I think could really be considered trivial are Gain, Delay and
> WaveShaper - every other type is significantly non-trivial to me.

I'd say that at least BiquadFilterNode, RealtimeAnalyserNode (given our  
suggested simplifications), AudioChannelSplitter and AudioChannelMerger  
are trivial too. In fact, if the spec actually specified what the nodes  
should do, the corresponding JavaScript implementations would be quite  
close to copy+paste versions of the spec.

> And even then, when you layer on the complexity involved with handling  
> AudioParams
> (for the gain on Gain and the delayTime on Delay), and the interpolation
> between curve points on WaveShaper, I'm not convinced they're actually
> trivial.

If handling AudioParams is actually a complex thing, I think we should  
seriously consider simplifying the corresponding requirements or dropping  
it altogether.

> The easiest interface would be just be to have an output device stream.
>  However, I think having a basic audio toolbox in the form of node types
> will cause an explosion of audio applications -

...which is why there are JS libs. The Web Audio API is already too  
complex to use for most Web developers, so there are already libs/wrappers  
available for making it easier to build basic audio applications.

I'd much rather prefer a JS lib to implement all the common nodes  
(typically the ones that are already in the spec + more). Not only would  
it be 100% cross-browser inter-operable, it would also be extensible at  
any time, without requiring spec updates and adoption by clients.

> building the vocoder example was illustrative to me, because I ended up
> using about half of the node types, and found them to be fantastically
> easy to build on.

That would have been just as easy if the nodes were implemented in a JS  
lib, wouldn't it?

> Frankly, if they hadn't been there, I wouldn't have built the vocoder,  
> because it
> would have been too complex for me to take on.  After working through a
> number of other scenarios in my mind, I'm left with the same feeling -
> having this set of node types fulfills most of the needs that I can
> envision, and the few I've thought of that aren't covered, I'm happy to  
> use JS nodes for.  The only place where I'm personally not entirely  
> convinced
> is that I think I would personally trade the DynamicsCompressorNode for  
> an envelope follower node.  Maybe that's just because I'd rather hack  
> noise
> gates, auto-wah effects, etc., without dropping into JS node.
>
>> I've already proposed a few things that would make the script nodes more
>> first-class in my opinion. [1] [2]
>
> I'm in favor of anything necessary to make JavaScript nodes a first-class
> citizen.
>
>
>>> Have any performance comparisons been made between the native nodes and
>>> their corresponding JavaScript implementations? I'm quite sure that  
>>> native implementations will be faster (perhaps significantly in  
>>> several cases), and I can also make some guesses as to which nodes  
>>> would be actual
>>> performance bottle necks, but to what extent?
>>>
>>
> I don't think we've implemented everything twice, once in JavaScript and
> once in native code, and optimized their performance, no.  The best
> comparison would, I suppose, be any work that Robert did for effects in  
> the MSP proposal.
>
>
>> Those things said, I believe having a fast native convolution
>> implementation is critical to games and other applications, even with
>> native implementations convolution is a very expensive operation. But,  
>> that said, and bear with me as I've said this before, it would be a  
>> good idea to expose a function or a class (to keep state for  
>> performance optimizations) to do convolution, rather than a node. This  
>> would be
>> far more generally useful, and I believe the browser environment could
>> benefit from this in other applications as well, such as image
>> processing. Otherwise this will end up redefined elsewhere. For
>> convolution, this is even fairly simple to do, because you could
>> make it real-time just by taking advantage of the overlap-add, as
>> the current native implementation is, so that at simplest, we could
>> have a function that would take the output array, input array and an  
>> array containing the kernels in frequency domain. Of course,
>> for more general use, we'll need to define how dimensions/channels
>> are handled, etc.
>>
>
> Hmm.  I understand what you're suggesting, but I'm a little concerned  
> that only handling tools to developers that say "perform a convolution
> on an arbitrary n-dimensional array of data" and hoping they figure out
> how to apply it to make reverb, as well as image blurring effects, is
> not the right approach.  I don't think everything should be roll it
> yourself from the bottom level.

With a JS lib designed and implemented by audio & signal processing  
experts, this would not be a problem. In fact, I personally think that the  
current convolver node is way to abstract for most Web developers anyway.  
How do you make reverb from an array? It's quite a difficult area for  
someone without enough understanding of signal processing & acoustics. I  
guess most developers would likely use pre-created impulse responses and  
copy/paste tutorial code without understanding much of how it works. An  
algorithmic (feedback-based) reverb node with a few simple parameters  
would be much easier to use IMO (even if it wouldn't produce as  
good/accurate results).

/Marcus
Received on Wednesday, 23 May 2012 08:01:15 UTC