Re: AudioNode API Review - Part 1 (StandingWave3 Comparison) from Chris Rogers on 2010-10-05 (public-xg-audio@w3.org from October 2010)

From: Chris Rogers <crogers@google.com>
Date: Tue, 5 Oct 2010 14:26:25 -0700
To: Joseph Berkovitz <joe@noteflight.com>
Cc: public-xg-audio@w3.org
Message-ID: <AANLkTimUx7S5g+BMh6bDNm_t+OeantV0-HRROF7gFnYP@mail.gmail.com>
On Mon, Oct 4, 2010 at 2:47 PM, Joseph Berkovitz <joe@noteflight.com> wrote:

> Hi folks,
>
> Thanks for the great call today. I am very enthusiastic about the work this
> group has taken on, and about its present direction.
>
> As a new member I want to do my best to respond to the Web Audio API
> Proposal with some initial observations.  I know that a lot of thinking has
> gone into the current draft, so please forgive my ignorance of the many
> points that have already discussed -- I'm admitting now to having only
> briefly skimmed the mailing list archives.  I did try out the sample code,
> though, and read some of it.  Very impressive!
>
> I'm going to break my response up into two main parts.  Part 1 (this one)
> will be a high-level comparison of the Web Audio API Proposal with
> StandingWave 3, the internal synthesis engine used by Noteflight and
> available on GitHub as the standingwave3 project.  Part 2 (to follow in the
> next day or two) will be a commentary on various aspects of the Web Audio
> API, responding to the draft on a feature-by-feature basis.
>

Hi Joe, it's great to have you involved in the group.  StandingWave 3 looks
very interesting and I'm looking forward to hearing your ideas!




> RESOURCES
>
> The SW3 API documentation can be browsed here:
>      http://blog.noteflight.com/projects/standingwave3/doc/
>
> Noteflight is here (the best-known Standingwave app, though not the only
> one)
>      http://www.noteflight.com
>

Joe, can you point me to some source code examples using your StandingWave
API.  The documentation looks excellent, but it's always good to look at a
variety of working code examples to get a better feel for how the different
parts work together.

In the past, I worked on a sample-based playback engine at Beatnik (formerly
Headspace) and also wrote the DLS synthesizer (DLSMusicDevice) at Apple, so
I'm familiar with many of the concepts you've been working with.  It's
interesting to combine these node-based ideas with sample-playback
approaches.


FUNDAMENTALS
>
> Requirements: SW3 was designed to support the synthesis of Western musical
> styles from semantic music notation, using an instrument sample library,
> applying effects and filters as needed for ornamentation and interpretation.
> But it was also intended to serve as an all-purpose package for Flash sound
> developers, and does take a general approach to many issues. I believe it
> would be possible to write a number of the Web API demo apps using SW3,
> where capabilities overlap.
>
> Underlying Approach: SW3 is written in ActionScript 3 with low level DSP
> processing in C.  The role of nodes is similar in both packages, allowing
> low-level constructs in C to be encapsulated out of view from the
> application programmer who works at a scripting-language level.  Our finding
> with SW3 has been that nodes are a very useful way of surfacing audio
> synthesis and that application builders are able to work with them
> effectively, but that there is a learning curve for people who aren't used
> to programming with declarative objects in this way.
>

Yeah, it's really amazing how similar the two approaches are at a
fundamental level.  As far as the learning curve, I think writing lots of
demo code and  tutorials can help here.



>
>
> EFFECTS AND FILTERS
>
> Loop Points***: SW3 allows a single loop point to be specified for an audio
> buffer.  This means that the loop "goes back" to a particular nonzero sample
> index after the end is reached.  This feature is really essential for
> wavetable synthesis, since one is commonly seeking to produce a simulated
> note of indefinite length by looping a rather featureless portion of an
> actual note being played, a portion that must follow the initial attack.
>

I agree that this is an important feature.  In my approach, I would add loop
points into the AudioBufferSourceNode.  Then it would also probably be
necessary to implement an amplitude envelope as well.  Although it could be
a classic ADSR envelope, it would be nice to consider being able to define
arbitrary shaping curves, perhaps leveraging the AudioCurve we've just
started talking about.



>
> Resampling / Pitch Shifting: SW3 uses an explicit filter node
> (ResamplingFilter) which resamples its input at an arbitrary sampling rate.
> This allows any audio source to be speeded up/slowed down (making its
> overall duration shorter/longer).  Contrast this with the Web API, in which
> AudioBufferSourceNode "bakes in" resampling, via the playbackRate attribute.
>  It appears that in the Web API no composite source or subgraph can be
> resampled.  Now, the Web API approach would actually be sufficient for
> Noteflight's needs (since we only apply resampling directly to audio
> buffers) but it's worth asking whether breaking this function out as a
> filter is useful.
>

I've taken the approach of running all of the AudioNodes at the same rate.
 If you have a particular node which performs resampling, then you can get
into all kinds of trouble with different parts of the graph running at
different rates.  It's possible to connect nodes to one another in such a
way that it becomes impossible to render because of the rate differences.



> Looping-as-Effect: SW3 also breaks out looping as an explicit filter node,
> allowing any composite source to be looped.
>

That sounds interesting.  I'd like to look at this use of your API and some
working example code using this technique.



>
> SEQUENCING***
>
> SW3 uses a very different approach to time-sequencing of audio playback to
> the Web API's noteOn(when) approach.  I feel that each approach has distinct
> strengths and weaknesses.  This is probably the biggest architectural
> difference between the projects.
>
> ...
>


> Consider that a single musical note in a typical wavetable synth is a
> subgraph that might include stuff like this:
> - a basic audio sample, possibly more than one for a composite sound
> - an amplifier
> - an envelope modulation that controls the amp gain
> - a low-pass filter
> - an envelope modulation that controls the filter center frequency
> - other assorted modulations for musical expression (e.g. vibrato, whammy
> bar, whatever)
>
> So a single note is a little subgraph with moving parts, which require
> their own independent scheduling in time (though they are coordinated with
> each other in a predictable way).  The sample has to be timed, and the
> modulations in particular are going to have customized onsets and durations
> for each note, based on the duration, volume, articulation and ornamentation
> of the note.
>
> Now... in the Web API approach, one has to calculate the onset time of each
> time-dependent element of the subgraph and individually schedule it, adding
> an overall start time into each one so that each call to
> noteOn()/scheduleAutomation() references an absolute time rather than a time
> offset relative to the start of the note.  In other words... key point
> coming up here... *scheduling an audio subgraph at a specific point in time
> requires knowledge of its internals* in order to schedule its bits and
> pieces to occur in coordination with each other.
>
> Compare this with the Performance approach, in which you have some factory
> code that simply makes the right subgraph, without any scheduling
> information at all.  It gets scheduled as a whole, outside of the code that
> made it, by other code that is only concerned with when some event should
> happen.
>
> The net result is that in the Web API approach, if you want to encapsulate
> knowledge of a subgraph's internals, you have to pass an onset time into the
> code that makes that subgraph.  This doesn't seem good to me because it
> conflates the construction and the scheduling of a complex sound.  I am
> still thinking about what to recommend instead (other than just adding a
> Performance-like construct to the Web API), but would first like to hear
> others' reaction to this point.
>

I can understand why you designed your API as you did, given that the
primary application is a software-based sample playback synthesizer and
sequencer.  Although, I believe the Web Audio API will be able to support
such applications (with the addition of envelopes, etc.), for most
applications I believe that the additional complexity of the Performance
approach is not that useful.  And for the cases where it is, it's not that
hard to implement your API approach with a fairly small JS wrapper library
without really affecting performance.



>
>

> MODULATORS***
>
> SW3 has a concept of Modulators, whereas the Web API uses an AudioCurve
> (being fleshed out by Chris in real time as I type).  SW3 Modulators have
> some normalized value at each sample frame.  They are used to modulate pitch
> (i.e. resampling rate) and gain at various places in the synthesis pipeline.
>  Due to the performance overhead and complexity of allowing *any* parameter
> to be continuously modulated, SW3 only allows Modulators to be plugged into
> certain key parameter types of certain nodes, typically gain, pitch-shift or
> frequency parameters.  We needed to make our tight loops as tight as
> possible without checking to see if some variable needs to change its value
> on each iteration.
>

Believe me, I completely sympathize with you about keeping the loops as
tight as possible without worrying about arbitrary combinations of
parameters changing for every sample in every DSP algorithm :)


SW3 currently supports just two kinds of modulators: piecewise-linear (where
you supply a list of time/value tuples) and envelope generators (ADHSR).
 LFOs would be great but SW3 doesn't have LFOs per se, instead we use a
piecewise-linear modulator as a triangle/sawtooth/square-wave source.  An
ADHSR (Attack/Decay/Hold/Sustain/Release) modulator is particularly
important since it supplies a musical shape to a note.


I think we should support at least this level of modulation.  I'll spend a
little time coming up with some more concrete design proposals...

Anyway, thanks for your detailed analysis.  It's great to hear it!

Cheers,
Chris
Received on Tuesday, 5 October 2010 21:26:59 UTC