- From: Joseph Berkovitz <joe@noteflight.com>
- Date: Mon, 4 Oct 2010 17:47:43 -0400
- To: public-xg-audio@w3.org
- Message-Id: <E86E640C-9274-4256-813E-19F75E304A10@noteflight.com>
Hi folks, Thanks for the great call today. I am very enthusiastic about the work this group has taken on, and about its present direction. As a new member I want to do my best to respond to the Web Audio API Proposal with some initial observations. I know that a lot of thinking has gone into the current draft, so please forgive my ignorance of the many points that have already discussed -- I'm admitting now to having only briefly skimmed the mailing list archives. I did try out the sample code, though, and read some of it. Very impressive! I'm going to break my response up into two main parts. Part 1 (this one) will be a high-level comparison of the Web Audio API Proposal with StandingWave 3, the internal synthesis engine used by Noteflight and available on GitHub as the standingwave3 project. Part 2 (to follow in the next day or two) will be a commentary on various aspects of the Web Audio API, responding to the draft on a feature-by-feature basis. ----------------------------------- COMPARISON OF THE WEB AUDIO API ("Web API") WITH STANDINGWAVE 3 ("SW3") My goal in this writeup is to highlight and compare different approaches to problems taken by the two libraries, with the aim of stimulating some thought and discussion. In general, I'm not going to focus on the things that are very similar, since there are so many points of similarity. I'm also not going to go through the many great things that the Web API does which SW3 omits -- obviously we should keep those unless they feel extra. Let me say at the outset that I am not looking for the literal adoption of SW3 concepts within the Web API. If the group feels that there is some advantage in using some ideas from SW3 within the Web API, that could be valuable. And if the comparison moves the group to feel that the API is just fine as it is, I consider that just as valuable an outcome. I will call out essential features for Noteflight's use cases with a preceding triplet of asterisks (***). RESOURCES The SW3 API documentation can be browsed here: http://blog.noteflight.com/projects/standingwave3/doc/ Noteflight is here (the best-known Standingwave app, though not the only one) http://www.noteflight.com FUNDAMENTALS Requirements: SW3 was designed to support the synthesis of Western musical styles from semantic music notation, using an instrument sample library, applying effects and filters as needed for ornamentation and interpretation. But it was also intended to serve as an all-purpose package for Flash sound developers, and does take a general approach to many issues. I believe it would be possible to write a number of the Web API demo apps using SW3, where capabilities overlap. Underlying Approach: SW3 is written in ActionScript 3 with low level DSP processing in C. The role of nodes is similar in both packages, allowing low-level constructs in C to be encapsulated out of view from the application programmer who works at a scripting-language level. Our finding with SW3 has been that nodes are a very useful way of surfacing audio synthesis and that application builders are able to work with them effectively, but that there is a learning curve for people who aren't used to programming with declarative objects in this way. EFFECTS AND FILTERS Loop Points***: SW3 allows a single loop point to be specified for an audio buffer. This means that the loop "goes back" to a particular nonzero sample index after the end is reached. This feature is really essential for wavetable synthesis, since one is commonly seeking to produce a simulated note of indefinite length by looping a rather featureless portion of an actual note being played, a portion that must follow the initial attack. Resampling / Pitch Shifting: SW3 uses an explicit filter node (ResamplingFilter) which resamples its input at an arbitrary sampling rate. This allows any audio source to be speeded up/slowed down (making its overall duration shorter/longer). Contrast this with the Web API, in which AudioBufferSourceNode "bakes in" resampling, via the playbackRate attribute. It appears that in the Web API no composite source or subgraph can be resampled. Now, the Web API approach would actually be sufficient for Noteflight's needs (since we only apply resampling directly to audio buffers) but it's worth asking whether breaking this function out as a filter is useful. Looping-as-Effect: SW3 also breaks out looping as an explicit filter node, allowing any composite source to be looped. SEQUENCING*** SW3 uses a very different approach to time-sequencing of audio playback to the Web API's noteOn(when) approach. I feel that each approach has distinct strengths and weaknesses. This is probably the biggest architectural difference between the projects. In SW3, all sources always are considered to begin generating a signal at time zero. Consequently, there is no triggering construct such as noteOn() at all. Instead, SW3 provides a special scheduling object called a Performance, which aggregates and schedules a list of sources to start at specific onset times. The output of the Performance is thus a mixdown of all the sources scheduled within it, each delayed to start at the correct time. It's a bit like this: var node1 = ... // create some audio source starting nominally at t=0 var node2 = ... // create some other source, also at t=0 var perf = new Performance(); perf.addNode(node1, 1.0); // schedule node1 to occur at offset 1.0 sec within Performance perf.addNode(node2, 2.0); // schedule node2 to occur at offset 2.0 sec within Performance context.destination = performance; // play the performance consisting of the scheduled sources Note that a Performance is *just an audio source*. Thus, you can play a Performance just like a standalone source, pipe a Performance into some other graph of filters/effects, or even schedule it into a higher- level Performance made out of shorter sub-Performances. Performances can also be looped meaning that they repeatedly schedule their contents, which is better for a longish run of material than rendering into a buffer and looping the buffer. You might say a Performance is sort of a smart time-aware mixer that maintains an internal index. It efficiently ignores inputs that haven't started yet, or which have already stopped. You can schedule stuff into a Performance while it's playing, too. So... isn't this functionally equivalent to having many different sources in the Web API that are routed into a mixer or destination, and calling noteOn() on the head end of each source with a different value? Well no, not exactly. The difference has to do with how many objects you have to schedule, and how complicated it gets to figure out what time values to pass to these invocations, and what knowledge you need in order to schedule a composite subgraph to occur at some point. This is where I think the problem lies in the current API draft. Consider that a single musical note in a typical wavetable synth is a subgraph that might include stuff like this: - a basic audio sample, possibly more than one for a composite sound - an amplifier - an envelope modulation that controls the amp gain - a low-pass filter - an envelope modulation that controls the filter center frequency - other assorted modulations for musical expression (e.g. vibrato, whammy bar, whatever) So a single note is a little subgraph with moving parts, which require their own independent scheduling in time (though they are coordinated with each other in a predictable way). The sample has to be timed, and the modulations in particular are going to have customized onsets and durations for each note, based on the duration, volume, articulation and ornamentation of the note. Now... in the Web API approach, one has to calculate the onset time of each time-dependent element of the subgraph and individually schedule it, adding an overall start time into each one so that each call to noteOn()/scheduleAutomation() references an absolute time rather than a time offset relative to the start of the note. In other words... key point coming up here... *scheduling an audio subgraph at a specific point in time requires knowledge of its internals* in order to schedule its bits and pieces to occur in coordination with each other. Compare this with the Performance approach, in which you have some factory code that simply makes the right subgraph, without any scheduling information at all. It gets scheduled as a whole, outside of the code that made it, by other code that is only concerned with when some event should happen. The net result is that in the Web API approach, if you want to encapsulate knowledge of a subgraph's internals, you have to pass an onset time into the code that makes that subgraph. This doesn't seem good to me because it conflates the construction and the scheduling of a complex sound. I am still thinking about what to recommend instead (other than just adding a Performance-like construct to the Web API), but would first like to hear others' reaction to this point. MODULATORS*** SW3 has a concept of Modulators, whereas the Web API uses an AudioCurve (being fleshed out by Chris in real time as I type). SW3 Modulators have some normalized value at each sample frame. They are used to modulate pitch (i.e. resampling rate) and gain at various places in the synthesis pipeline. Due to the performance overhead and complexity of allowing *any* parameter to be continuously modulated, SW3 only allows Modulators to be plugged into certain key parameter types of certain nodes, typically gain, pitch-shift or frequency parameters. We needed to make our tight loops as tight as possible without checking to see if some variable needs to change its value on each iteration. SW3 currently supports just two kinds of modulators: piecewise-linear (where you supply a list of time/value tuples) and envelope generators (ADHSR). LFOs would be great but SW3 doesn't have LFOs per se, instead we use a piecewise-linear modulator as a triangle/sawtooth/ square-wave source. An ADHSR (Attack/Decay/Hold/Sustain/Release) modulator is particularly important since it supplies a musical shape to a note. Since many modulating functions with a psychoacoustic or musical result are linear in the *log* of the parameter, not in the parameter itself, the interpretation of the modulator often requires an implicit exponentiation layered on top of a piecewise-linear modulator. For instance, one might want an LFO that makes a tone wobble between a semitone below and a semitone above. This LFO would have a range from -k to +k (no pitch change = 0), but the corresponding pitch shift would use a multiplicative factor ranging between exp(-k) and exp(k) (no pitch change = exp(0) = 1). ------ That's it -- I'll try to get Part 2 together shortly. In the meantime, I hope this is helpful. ... . . . Joe Joe Berkovitz President Noteflight LLC 160 Sidney St Cambridge, MA 02139 phone: +1 978 314 6271
Received on Monday, 4 October 2010 21:48:29 UTC