Re: Simplifying specing/testing/implementation work from Jussi Kalliokoski on 2012-07-21 (public-audio@w3.org from July to September 2012)

From: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
Date: Sat, 21 Jul 2012 10:28:39 +0300
To: Chris Wilson <cwilso@google.com>
Cc: Raymond Toy <rtoy@google.com>, Marcus Geelnard <mage@opera.com>, public-audio@w3.org
Message-ID: <CAJhzemUWfRf1sG5ChWPYn9PJjPFcGEds5K4TdravpakSa7LTwQ@mail.gmail.com>
I'd just like to point out that nobody is forcing you to have to do complex
things - this is what frameworks are for. If we provide the discussed
tools, people can simply extend the graph to have all the nodes it's
specified to have now and it makes no difference from developer point of
view, aside from including a library in the project. A library that might
just suit their needs much better than the original API, after all it's
quite established even here that people have very different tastes for
frameworks and want things to work differently.

Think of all the possibilities that open up once we provide a
high-performance native DSP library that isn't tied to audio. We get all
those benefits for video (decoding realtime HD video in JS today is next to
impossible, but we might just give it a hand) and picture as well (fast
convolution for images and you no longer get 1fps if you have a blur effect
in a canvas). Things like crowd-sourced vaccination calculations get a
whole new meaning, you could run them in your browser, and fast. We'd open
up a door for innovation of custom DSP languages, they'd be really awkward
— if even possible — to design on top of a high-level node-based processing
API (there's a reason in software history that low-level APIs prevail, by
the way). Not to mention the things that haven't been thought about yet.
Having a low-level DSP API is just the way to go.

But if we provide both a DSP library for typed arrays and the specialized
nodes it's very much a duplication of work and hardly justifiable.

The "core" part of the graph, however, is reasonable to preserve. It
provides a reasonable representation for what we need and is extensible
both by developers and us. And by extensible to developers I mean that they
can use the graph with their custom nodes. By extensible to us I mean that
we can add more types of graph inputs (microphones, we already have media
elements), we can extend the API to provide more information about the
user's output device, or hopefully even using multiple output devices,
without having to break things.

AudioParams are also a useful tool as they simplify the communication with
the nodes of the graph, having to postMessage control changes to a worker
is not only awkward but it can be slow as well.

I think any specialized processing node just begs the question "why not
this or that feature, too?" Just look at what happened to FFMPEG, heh. Or
CSS Filter Effects [1] for that matter. With low-level primitives, we can
just say there's this general tool, use it, with high-level APIs we can say
"yeah, we can always add more special cases..." or just "no, sorry, you
can't do that". It's a rat's nest, I tell yoo! ^^

Cheers,
Jussi

On Thu, Jul 19, 2012 at 8:55 PM, Chris Wilson <cwilso@google.com> wrote:

> I'd like to request that we not plan any grand changes here until Chris is
> back from vacation (end of the month).  I'd also like to explicitly
> separate my opinion detailed below from his, since we are coming at the API
> from distinctly different angles (I'm mostly a consumer of the API, he's an
> API designer) and backgrounds (he's an audio engineering expert, and I'm a
> hack who likes playing around with things that go bing!), and despite both
> working for Google, aren't always in agreement.  :)
>
> My opinion- in short, I oppose the idea of having a "core spec" as
> captured above.  I think it will simply become a way for implementers to
> skip large parts of the API, while causing confusion and compatibility
> problems for developers using the API.
>
> I think considering JSNode* as the core around which most audio apps will
> be built is incorrect.  I've now built a half-dozen relatively complex
> audio applications - the Vocoder <http://webaudiovocoder.appspot.com/>,
> the Web Audio Playground <http://webaudioplayground.appspot.com/>, my in-progress
> DJ deck <http://cwilsotest.appspot.com/wubwubwub/index.html>, a couple of
> synthesizers, and a few others I'm not ready to show off yet.  If I had to
> use JS node to create my own delays, filters by setting up my own FFT
> matrices, etc., quite frankly I would be off doing something else.  I think
> recognizing these features as basic audio tools is critical; the point of
> the API, as I've gotten to know it, is to enable powerful audio
> applications WITHOUT requiring a degree in digital signal processing.  In
> the Web Audio coding I've done, I've used JSNode exactly once - and that
> was just to test it out.  I have found zero need for it in the apps I've
> built, because it's been more performant as well as far, far easier to use
> tools provided for me.
>
> If the "core spec" is buffers, JSNodes, and AudioNode, I see this as an
> ultimately futile and delaying tactic for getting powerful audio apps built
> by those without - very much like we had a "CSS1 Core" spec for a while.
>  If the goal is simply to expose the audio output (and presumably input)
> mechanism, then I'm not sure why an AudioData API-like write() API is not a
> much simpler solution - if there's no other node types than JSNode, I'm not
> sure what value the Node routing system provides.
>
> Ultimately, I think a lot of game developers in particular will want to
> use the built-in native processing.  If the AudioNode types like Filter and
> Convolver aren't required in an implementation, then either we are creating
> a much more complex compatibility matrix - like we did with CSS1 Core, but
> worse - or they won't be able to rely on those features, in which case I'm
> not sure why we have a routing system.
>
> That said - I do agree (as I think Chris does also) that JSNode isn't
> where it needs to be.  It DOES need support for AudioParam, support for
> varying number of inputs/outputs/channels, and especially worker-based
> processing.  But just because it COULD be used to implement DelayNode
> doesn't mean DelayNode shouldn't be required.
>
> I'm also not opposed to a new API for doing signal processing on Typed
> Arrays in JavaScript.  But again, I'd much rather have the simple interface
> of BiquadFilterNode to use than having to implement my own filter via that
> interface - I see that as a much more complex tool, when I NEED to build my
> own tools.
>
> All this aside, I do believe the spec has to clearly specify how to
> implement interoperable code, and I recognize that it is not there today.
>
> -Chris
>
> *I use "JSNode" as shorthand for "programmable node that the developer has
> to implement themselves" - that is, independent of whether it's JavaScript
> or some other programming language.
>
> On Thu, Jul 19, 2012 at 9:44 AM, Raymond Toy <rtoy@google.com> wrote:
>
>>
>>
>> On Thu, Jul 19, 2012 at 7:11 AM, Jussi Kalliokoski <
>> jussi.kalliokoski@gmail.com> wrote:
>>
>>>
>>> Obviously SIMD code is faster than addition in JS now, for example. And
>>> yes, IIR filter is a type of a convolution, but I don't think it's possible
>>> to write an efficient IIR filter algorithm using a convolution engine —
>>> after all, a convolution engine should be designed to deal with a FIRs. Not
>>> to mention that common IIR filters have 4 (LP, HP, BP, N) kernels, which
>>> would be really inefficient for a FastConvolution algorithm, even if it
>>> supported FIR. And as far as IIR filter performance goes, I think SIMD
>>> instructions offer very little usefulness in IIR algorithms, since they're
>>> so linear.
>>>
>>>
>>  https://bugs.webkit.org/show_bug.cgi?id=75528 says that adding SIMD
>> gives a 45% improvement.
>>
>> Ray
>>
>
>
Received on Saturday, 21 July 2012 07:29:10 UTC