Standardizing audio "level 1" features first? from Colin Clark on 2012-02-09 (public-audio@w3.org from January to March 2012)

From: Colin Clark <colinbdclark@gmail.com>
Date: Wed, 8 Feb 2012 21:29:41 -0500
To: public-audio@w3.org
Message-Id: <BDA109AF-C164-4A72-BA7A-78C74D05F41E@gmail.com>

Hi all,

I'm Colin Clark, a composer and open source web developer. I'm also the author of Flocking, a nascent JavaScript synthesis and audio processing framework for artists and composers. I'm no DSP expert, but I've had a lot of fun recently working with the current generation of web audio APIs. I've been following this working group's discussions for the past several months and wanted to share a few thoughts.

In my mind, the mark of a good web standard is that it's sufficiently robust and low-level enough to enable developers working in JavaScript to create innovative, comprehensive, and performant libraries and frameworks to suit the diverse needs of their users. Simply put, there's never going to be a single, one-size-fits-all audio processing framework that will meet everyone's needs and use cases. Lower-level APIs help promote choice and innovation, even if they're not the easiest to use or most comprehensive.

We've seen a lot of this in practice on the web: as far as View technologies go, the DOM is pretty low-level. The fact that it doesn't specify a component model for user interface widgets (which it probably would have done poorly) has enabled an incredible amount of innovation by libraries and frameworks like jQuery, Dojo, and others. Each library takes a unique approach to the architecture and developer ergonomics of user interface widgets, and web developers are free to choose the one that best suits their needs and style. Similarly, Canvas and WebGL provide excellent low-level APIs upon which library developers can build suitable abstractions. And developer choice has flourished as a result.

The same is true for audio synthesis: there are a lot of interesting ways to build a synthesis environment. In the desktop world, SuperCollider, CSound, and ChucK are markedly different; each optimizes for a different use case and philosophy. Already on the web, we've got Jussi's Audiolib, Joe Turner's Audiolet, Corban Brook's dsp.js, and more. These libraries are already showing their authors' unique approaches to solving the problem of synthesis on the web.

Similarly, the difficulty of standardizing signal processing algorithms shouldn't be underestimated. I think it's clear to all of us that there are many algorithms for even simple tasks like generating basic waveforms. Each has its own trade-offs, and instrument designers will need to pick the best one for their needs and processing requirements. For example, doing simple amplitude modulation with a triangle wave for a vibrato effect won't require the added processing complexity of a band-limited oscillator. But for modelling an analog synthesizer, complex band-limited waveforms are absolutely critical. In the end, developers are going to need both, and we can't possibility standardize each and every algorithm and then expect browser vendors to implement them. And, as roc points out, multiple browser implementations are the key to a healthy, competitive web.

So if we can't standardize everything, how do we ensure that developers can fill in the gaps? I think the ideal audio specification would put web developers working in JavaScript on the same footing as the browser developers in terms of their ability to extend the system in a reasonably performant way. Such a standard would address the most critical problems of writing data to audio devices, handling latency, and processing samples efficiently within Web Workers. At it would leave a lot of room for libraries and frameworks to innovate.

I'm sensitive to the fact that JavaScript today faces a number of critical performance issues that make writing real-time systems difficult. Nonetheless, given how long standards take to propagate, it's probably best to lean towards a solution that will scale into the future, where more and more complex application code will be written in JavaScript. Most runtime engines have seen order of magnitude performance improvements in the last five years. Web workers provide an isolated environment where processing can be done independently from the main thread, and hopefully browser vendors will invest in improved, realtime-friendly garbage collection strategies. It may not be awesome yet, but do you think the language has the potential to host realtime processing in a scalable way?

I wonder if it makes sense to consider a simpler "Level 1" specification that doesn't yet tackle the modelling of a single audio processing pipeline or standardization of unit generators written in C. Instead, JavaScript developers will need to create the abstractions that best suit their needs. While I haven't yet had an opportunity to work with roc's MediaStreams proposal, it seems closer to this idea of "useful minimalism" than the Web Audio API, which aims to be comprehensive. This simpler approach may not be the ideal solution, nor a comprehensive one, but it might be a good first path, giving multiple browser vendors an opportunity to get solid implementations in place and enabling developers and musicians to build more cool stuff.

At very least, perhaps we'd consider writing an additional use case or two that captures the needs of web developers to create useful libraries, processing algorithms, and abstractions on top of the lower-level API, just to ensure we're keeping this need in mind while we're comparing and refining the two proposals. I'd be happy to help out this if it's useful to anyone.

I hope my comments are helpful,

Colin

---
Colin Clark
Technical Lead, Fluid Project
http://fluidproject.org

Received on Friday, 10 February 2012 08:05:53 UTC