Re: AudioNode API Review - Part 1 (StandingWave3 Comparison)

On Oct 4, 2010, at 8:51 PM, Ian Ni-Lewis wrote:

> Resampling / Pitch Shifting: SW3 uses an explicit filter node  
> (ResamplingFilter) which resamples its input at an arbitrary  
> sampling rate. This allows any audio source to be speeded up/slowed  
> down (making its overall duration shorter/longer).  Contrast this  
> with the Web API, in which AudioBufferSourceNode "bakes in"  
> resampling, via the playbackRate attribute.  It appears that in the  
> Web API no composite source or subgraph can be resampled.  Now, the  
> Web API approach would actually be sufficient for Noteflight's needs  
> (since we only apply resampling directly to audio buffers) but it's  
> worth asking whether breaking this function out as a filter is useful.
>
> If you allowed filters to resample, wouldn't you also have to allow  
> input and output buffers of variable size? One of the virtues of the  
> current design seems to be that buffer sizes can be kept constant  
> throughout the graph.

This is not an issue because of our "pull" approach that you correctly  
guessed below.  The Resampling filter gets a request for N frames and  
requests approximately N*k frames from its source where k is the  
resampling factor.  So maybe this idea is a non-starter given the  
internals demanded by the current Web API.  I don't actually think  
it's super important though.

>
> Looping-as-Effect: SW3 also breaks out looping as an explicit filter  
> node, allowing any composite source to be looped.
>
> Again, this seems to require more complex input/output logic between  
> the nodes. I get the feeling that your downstream filters get to  
> pull inputs on demand, rather than having their inputs handed to  
> them by the upstream filters. True?

Completely true!

> SEQUENCING***
>
> SW3 uses a very different approach to time-sequencing of audio  
> playback to the Web API's noteOn(when) approach.  I feel that each  
> approach has distinct strengths and weaknesses.  This is probably  
> the biggest architectural difference between the projects.
>
>
> I agree that we shouldn't preclude more complex sequencing. But does  
> that need to be part of the core API? Or is it something that can be  
> built on top of a simpler time-based API?
>
> The net result is that in the Web API approach, if you want to  
> encapsulate knowledge of a subgraph's internals, you have to pass an  
> onset time into the code that makes that subgraph.  This doesn't  
> seem good to me because it conflates the construction and the  
> scheduling of a complex sound.  I am still thinking about what to  
> recommend instead (other than just adding a Performance-like  
> construct to the Web API), but would first like to hear others'  
> reaction to this point.
>
> How difficult would it be to write a Performance-like construct in  
> JS on top of the existing proposal? If it's doable, then I'd vote  
> (not that i have a vote, I'm just a lurker here :-) ) for  
> standardizing the core API and letting third parties add better  
> sequencing after the fact. I've seen other standards get horribly  
> bogged down by trying to be everything to everyone, and I'd hate to  
> see that happen here.
>

I don't want to bog things down at all.  For me this is not about  
doing complicated things, it's more about choosing an architecture  
that supports solid coding practices for the 90% case.  I think that  
Performance and the present noteOn() approach are functionally  
equivalent, more or less, so hopefully it's a win to compare and  
discuss their strengths and weaknesses.

I am concerned that a Performance like construct can not in fact be  
written in JS at the moment, because scheduling an arbitrary composite  
source is not possible (or if it is possible, would have to contain  
lots of recursion and awkward is-instance-of-X tests).  One would have  
to know all the bits and pieces inside it that require scheduling in  
an absolute timeframe.  That's the main issue.  Imagine SVG without  
groups and transformations, for example -- handed an arbitrary  
graphical construct without groups, how would you write code to  
display it at a given location?  You would have to walk its insides  
and translate every coordinate pair inside every object.  That's sort  
of what we have here today, but in terms of the time dimension.

I suspect that building simple subgraphs to represent musical events  
such as envelope-shaped-and-filtered-notes is a bread-and-butter use  
case for the audio API, and it feels as though it could be important  
to be able to write clean factory functions for such events that are  
not time-aware.   Such events are more complex than a single sample  
node, but certainly less complex than the hierarchies of Performances  
I described.  I like the latter but we can probably ignore it as a  
primary use case.

I am not pushing Performance as the only answer to this issue.   
Another possibility is, say, some sort of nested AudioContext that is  
time-shifted with respect to its parent.  That is sort of like the SVG  
idea of a group that has some matrix transform w/r/t its parent.

... .  .    .       Joe

Joe Berkovitz
President
Noteflight LLC
160 Sidney St
Cambridge, MA 02139
phone: +1 978 314 6271
http://www.noteflight.com
http://joeberkovitz.com

Received on Tuesday, 5 October 2010 14:38:02 UTC