Re: Simplifying specing/testing/implementation work from r baxter on 2012-07-21 (public-audio@w3.org from July to September 2012)

From: r baxter <baxrob@gmail.com>
Date: Sat, 21 Jul 2012 03:19:08 -0700
To: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
Cc: Chris Wilson <cwilso@google.com>, Raymond Toy <rtoy@google.com>, Marcus Geelnard <mage@opera.com>, public-audio@w3.org
Message-ID: <CACo9a0fdnTsnO5XdUGzWDTjsrRDc=W8gY7gRvJcgFAma=-U+6Q@mail.gmail.com>
As an API consumer, my grandest wish is to see per-sample audio
read/write, actually implemented, in "most" modern browsers, hopefully
in the next five years or so.

I'm sure this sentiment is not news.

An API as minimal as flash.media.sound, seems neither likely or
desirable, as a model for the webaudio API.  But, could such minimal
functionality serve as a baseline?  Without fragmenting the spec -
into 'core', 'advanced', etc, as Chris Wilson warns?  I'd wish there
could be a way to partition the problem space in workflow terms,
without committing all that to a spec.  But maybe I'm dreaming.  :-/

I am glad I am not in the working group, and so can type of such
things with some levity.

Thanks,
Roby

On Sat, Jul 21, 2012 at 12:50 AM, Jussi Kalliokoski
<jussi.kalliokoski@gmail.com> wrote:
> Oops, heh, forgot to link:
>
> [1] https://dvcs.w3.org/hg/FXTF/raw-file/tip/filters/index.html
>
>
> On Sat, Jul 21, 2012 at 10:28 AM, Jussi Kalliokoski
> <jussi.kalliokoski@gmail.com> wrote:
>>
>> I'd just like to point out that nobody is forcing you to have to do
>> complex things - this is what frameworks are for. If we provide the
>> discussed tools, people can simply extend the graph to have all the nodes
>> it's specified to have now and it makes no difference from developer point
>> of view, aside from including a library in the project. A library that might
>> just suit their needs much better than the original API, after all it's
>> quite established even here that people have very different tastes for
>> frameworks and want things to work differently.
>>
>> Think of all the possibilities that open up once we provide a
>> high-performance native DSP library that isn't tied to audio. We get all
>> those benefits for video (decoding realtime HD video in JS today is next to
>> impossible, but we might just give it a hand) and picture as well (fast
>> convolution for images and you no longer get 1fps if you have a blur effect
>> in a canvas). Things like crowd-sourced vaccination calculations get a whole
>> new meaning, you could run them in your browser, and fast. We'd open up a
>> door for innovation of custom DSP languages, they'd be really awkward — if
>> even possible — to design on top of a high-level node-based processing API
>> (there's a reason in software history that low-level APIs prevail, by the
>> way). Not to mention the things that haven't been thought about yet. Having
>> a low-level DSP API is just the way to go.
>>
>> But if we provide both a DSP library for typed arrays and the specialized
>> nodes it's very much a duplication of work and hardly justifiable.
>>
>> The "core" part of the graph, however, is reasonable to preserve. It
>> provides a reasonable representation for what we need and is extensible both
>> by developers and us. And by extensible to developers I mean that they can
>> use the graph with their custom nodes. By extensible to us I mean that we
>> can add more types of graph inputs (microphones, we already have media
>> elements), we can extend the API to provide more information about the
>> user's output device, or hopefully even using multiple output devices,
>> without having to break things.
>>
>> AudioParams are also a useful tool as they simplify the communication with
>> the nodes of the graph, having to postMessage control changes to a worker is
>> not only awkward but it can be slow as well.
>>
>> I think any specialized processing node just begs the question "why not
>> this or that feature, too?" Just look at what happened to FFMPEG, heh. Or
>> CSS Filter Effects [1] for that matter. With low-level primitives, we can
>> just say there's this general tool, use it, with high-level APIs we can say
>> "yeah, we can always add more special cases..." or just "no, sorry, you
>> can't do that". It's a rat's nest, I tell yoo! ^^
>>
>> Cheers,
>> Jussi
>>
>>
>> On Thu, Jul 19, 2012 at 8:55 PM, Chris Wilson <cwilso@google.com> wrote:
>>>
>>> I'd like to request that we not plan any grand changes here until Chris
>>> is back from vacation (end of the month).  I'd also like to explicitly
>>> separate my opinion detailed below from his, since we are coming at the API
>>> from distinctly different angles (I'm mostly a consumer of the API, he's an
>>> API designer) and backgrounds (he's an audio engineering expert, and I'm a
>>> hack who likes playing around with things that go bing!), and despite both
>>> working for Google, aren't always in agreement.  :)
>>>
>>> My opinion- in short, I oppose the idea of having a "core spec" as
>>> captured above.  I think it will simply become a way for implementers to
>>> skip large parts of the API, while causing confusion and compatibility
>>> problems for developers using the API.
>>>
>>> I think considering JSNode* as the core around which most audio apps will
>>> be built is incorrect.  I've now built a half-dozen relatively complex audio
>>> applications - the Vocoder, the Web Audio Playground, my in-progress DJ
>>> deck, a couple of synthesizers, and a few others I'm not ready to show off
>>> yet.  If I had to use JS node to create my own delays, filters by setting up
>>> my own FFT matrices, etc., quite frankly I would be off doing something
>>> else.  I think recognizing these features as basic audio tools is critical;
>>> the point of the API, as I've gotten to know it, is to enable powerful audio
>>> applications WITHOUT requiring a degree in digital signal processing.  In
>>> the Web Audio coding I've done, I've used JSNode exactly once - and that was
>>> just to test it out.  I have found zero need for it in the apps I've built,
>>> because it's been more performant as well as far, far easier to use tools
>>> provided for me.
>>>
>>> If the "core spec" is buffers, JSNodes, and AudioNode, I see this as an
>>> ultimately futile and delaying tactic for getting powerful audio apps built
>>> by those without - very much like we had a "CSS1 Core" spec for a while.  If
>>> the goal is simply to expose the audio output (and presumably input)
>>> mechanism, then I'm not sure why an AudioData API-like write() API is not a
>>> much simpler solution - if there's no other node types than JSNode, I'm not
>>> sure what value the Node routing system provides.
>>>
>>> Ultimately, I think a lot of game developers in particular will want to
>>> use the built-in native processing.  If the AudioNode types like Filter and
>>> Convolver aren't required in an implementation, then either we are creating
>>> a much more complex compatibility matrix - like we did with CSS1 Core, but
>>> worse - or they won't be able to rely on those features, in which case I'm
>>> not sure why we have a routing system.
>>>
>>> That said - I do agree (as I think Chris does also) that JSNode isn't
>>> where it needs to be.  It DOES need support for AudioParam, support for
>>> varying number of inputs/outputs/channels, and especially worker-based
>>> processing.  But just because it COULD be used to implement DelayNode
>>> doesn't mean DelayNode shouldn't be required.
>>>
>>> I'm also not opposed to a new API for doing signal processing on Typed
>>> Arrays in JavaScript.  But again, I'd much rather have the simple interface
>>> of BiquadFilterNode to use than having to implement my own filter via that
>>> interface - I see that as a much more complex tool, when I NEED to build my
>>> own tools.
>>>
>>> All this aside, I do believe the spec has to clearly specify how to
>>> implement interoperable code, and I recognize that it is not there today.
>>>
>>> -Chris
>>>
>>> *I use "JSNode" as shorthand for "programmable node that the developer
>>> has to implement themselves" - that is, independent of whether it's
>>> JavaScript or some other programming language.
>>>
>>> On Thu, Jul 19, 2012 at 9:44 AM, Raymond Toy <rtoy@google.com> wrote:
>>>>
>>>>
>>>>
>>>> On Thu, Jul 19, 2012 at 7:11 AM, Jussi Kalliokoski
>>>> <jussi.kalliokoski@gmail.com> wrote:
>>>>>
>>>>>
>>>>> Obviously SIMD code is faster than addition in JS now, for example. And
>>>>> yes, IIR filter is a type of a convolution, but I don't think it's possible
>>>>> to write an efficient IIR filter algorithm using a convolution engine —
>>>>> after all, a convolution engine should be designed to deal with a FIRs. Not
>>>>> to mention that common IIR filters have 4 (LP, HP, BP, N) kernels, which
>>>>> would be really inefficient for a FastConvolution algorithm, even if it
>>>>> supported FIR. And as far as IIR filter performance goes, I think SIMD
>>>>> instructions offer very little usefulness in IIR algorithms, since they're
>>>>> so linear.
>>>>>
>>>>
>>>>  https://bugs.webkit.org/show_bug.cgi?id=75528 says that adding SIMD
>>>> gives a 45% improvement.
>>>>
>>>> Ray
>>>
>>>
>>
>
Received on Saturday, 21 July 2012 10:19:36 UTC