RE: Standardizing audio "level 1" features first?

+1 with "and I also think not taking advantages of native capabilities in such a processing-intensive field would be as naive as saying "2D canvas will suffice, we don't need WebGL"."

IMHO, another reason that we should pay extra attention to the performance is that we should consider the usage of web audio on mobile platform.
We should try our best to enable the same usage on mobile platform as the desktop.

Best Regards

James


From: Jussi Kalliokoski [mailto:jussi.kalliokoski@gmail.com]
Sent: Friday, February 10, 2012 4:51 PM
To: Colin Clark
Cc: public-audio@w3.org
Subject: Re: Standardizing audio "level 1" features first?

Hey,

I'm sorry, that came out the wrong way, I mean in no way to discourage the great progress we've been making here, and I also think not taking advantages of native capabilities in such a processing-intensive field would be as naive as saying "2D canvas will suffice, we don't need WebGL". My point was that we should be reserved of the consequences making a big audio framework. That said, we shouldn't be reserved of the making itself. We should keep pushing this forward as fast as possible, but not too fast, I'm just saying that I believe it's best if we release in two stages. Harder to get it wrong. Especially since we already have a clear plan on the second phase, we know where the first stage must be able to extend to.

Jussi
On Fri, Feb 10, 2012 at 10:39 AM, Jussi Kalliokoski <jussi.kalliokoski@gmail.com<mailto:jussi.kalliokoski@gmail.com>> wrote:
Hey Colin!

I think you just voiced out the consensus of almost every live conversation I've had about this.

I agree that at this stage we should get the important part, low-level stuff, out there quickly. The basic connectivity, alpha and omega, r/w access to all audio/video streams present in the browser now and in the future. Trying to standardize more will potentially delay the date we get that access across all of the web. It will also increase the chances that we're going to go wrong. It's easy to change how a JS library works, because it won't break existing code, the code that would break can use an older version. But once it's built in the browser, you're stuck with it. It becomes hard to change, if not impossible, as we're seeing in the vendor prefix debate right now.

Of course, I'm biased, but like I said, it's also the consensus of almost every live conversation I've had.

Cheers,
Jussi Kalliokoski

On Thu, Feb 9, 2012 at 4:29 AM, Colin Clark <colinbdclark@gmail.com<mailto:colinbdclark@gmail.com>> wrote:
Hi all,

I'm Colin Clark, a composer and open source web developer. I'm also the author of Flocking, a nascent JavaScript synthesis and audio processing framework for artists and composers. I'm no DSP expert, but I've had a lot of fun recently working with the current generation of web audio APIs. I've been following this working group's discussions for the past several months and wanted to share a few thoughts.

In my mind, the mark of a good web standard is that it's sufficiently robust and low-level enough to enable developers working in JavaScript to create innovative, comprehensive, and performant libraries and frameworks to suit the diverse needs of their users. Simply put, there's never going to be a single, one-size-fits-all audio processing framework that will meet everyone's needs and use cases. Lower-level APIs help promote choice and innovation, even if they're not the easiest to use or most comprehensive.

We've seen a lot of this in practice on the web: as far as View technologies go, the DOM is pretty low-level. The fact that it doesn't specify a component model for user interface widgets (which it probably would have done poorly) has enabled an incredible amount of innovation by libraries and frameworks like jQuery, Dojo, and others. Each library takes a unique approach to the architecture and developer ergonomics of user interface widgets, and web developers are free to choose the one that best suits their needs and style. Similarly, Canvas and WebGL provide excellent low-level APIs upon which library developers can build suitable abstractions. And developer choice has flourished as a result.

The same is true for audio synthesis: there are a lot of interesting ways to build a synthesis environment. In the desktop world, SuperCollider, CSound, and ChucK are markedly different; each optimizes for a different use case and philosophy. Already on the web, we've got Jussi's Audiolib, Joe Turner's Audiolet, Corban Brook's dsp.js, and more. These libraries are already showing their authors' unique approaches to solving the problem of synthesis on the web.

Similarly, the difficulty of standardizing signal processing algorithms shouldn't be underestimated. I think it's clear to all of us that there are many algorithms for even simple tasks like generating basic waveforms. Each has its own trade-offs, and instrument designers will need to pick the best one for their needs and processing requirements. For example, doing simple amplitude modulation with a triangle wave for a vibrato effect won't require the added processing complexity of a band-limited oscillator. But for modelling an analog synthesizer, complex band-limited waveforms are absolutely critical. In the end, developers are going to need both, and we can't possibility standardize each and every algorithm and then expect browser vendors to implement them. And, as roc points out, multiple browser implementations are the key to a healthy, competitive web.

So if we can't standardize everything, how do we ensure that developers can fill in the gaps? I think the ideal audio specification would put web developers working in JavaScript on the same footing as the browser developers in terms of their ability to extend the system in a reasonably performant way. Such a standard would address the most critical problems of writing data to audio devices, handling latency, and processing samples efficiently within Web Workers. At it would leave a lot of room for libraries and frameworks to innovate.

I'm sensitive to the fact that JavaScript today faces a number of critical performance issues that make writing real-time systems difficult. Nonetheless, given how long standards take to propagate, it's probably best to lean towards a solution that will scale into the future, where more and more complex application code will be written in JavaScript. Most runtime engines have seen order of magnitude performance improvements in the last five years.. Web workers provide an isolated environment where processing can be done independently from the main thread, and hopefully browser vendors will invest in improved, realtime-friendly garbage collection strategies. It may not be awesome yet, but do you think the language has the potential to host realtime processing in a scalable way?

I wonder if it makes sense to consider a simpler "Level 1" specification that doesn't yet tackle the modelling of a single audio processing pipeline or standardization of unit generators written in C. Instead, JavaScript developers will need to create the abstractions that best suit their needs. While I haven't yet had an opportunity to work with roc's MediaStreams proposal, it seems closer to this idea of "useful minimalism" than the Web Audio API, which aims to be comprehensive. This  simpler approach may not be the ideal solution, nor a comprehensive one, but it might be a good first path, giving multiple browser vendors an opportunity to get solid implementations in place and enabling developers and musicians to build more cool stuff.

At very least, perhaps we'd consider writing an additional use case or two that captures the needs of web developers to create useful libraries, processing algorithms, and abstractions on top of the lower-level API, just to ensure we're keeping this need in mind while we're comparing and refining the two proposals. I'd be happy to help out this if it's useful to anyone.

I hope my comments are helpful,

Colin


---
Colin Clark
Technical Lead, Fluid Project
http://fluidproject.org

Received on Friday, 10 February 2012 09:16:15 UTC