- From: Chris Rogers <crogers@google.com>
- Date: Fri, 2 Jul 2010 13:36:58 -0700
- To: Ricard Marxer Piñón <ricardmp@gmail.com>
- Cc: Chris Marrin <cmarrin@apple.com>, Jer Noble <jer.noble@apple.com>, public-xg-audio@w3.org
- Message-ID: <AANLkTiko2IW5WPbExJ5LdD3kZhKzHthxCFol0sbvCZ3C@mail.gmail.com>
Hi Ricard, Thanks for your interest in the graph-based (node) approach. I really appreciate your comments, and will try to address your questions/ideas the best I can: On Thu, Jul 1, 2010 at 11:18 AM, Ricard Marxer Piñón <ricardmp@gmail.com>wrote: > > AudioPannerNode + AudioListener: > Maybe I'm wrong, but I think these nodes perform some processes that > are quite tied to data (HRTF) or that may be implemented in many > different ways that could lead to different outputs depending on the > method. Maybe they could be broken up into smaller blocks that have a > much more defined behavior and let the user of the API specify what > data to use or what algorithm to implement. > The approach I took with AudioPannerNode was to define a common interface used by all the panning models, for the source/panner position, orientation, velocity, and cone settings, distance model, etc. These are the attributes which are commonly used in current 3D game engines for spatializing/panning. Then I defined constants for different panning approaches: const unsigned short PASSTHROUGH = 0; const unsigned short EQUALPOWER = 1; const unsigned short HRTF = 2; const unsigned short SOUNDFIELD = 3; const unsigned short MATRIXMIX = 4; In looking back at this now, I realize that MATRIXMIX (and arguably PASSTHROUGH) does not really belong here and should be a different type of AudioNode, since it ignores position, orientation, etc. But, EQUALPOWER (vector-based panning), HRTF, and SOUNDFIELD all make use of the common attributes such as position, orientation, etc. Instead of defining a 'panningModel' attribute, it would also be possible to subclass AudioPannerNode with these three types. Then we would have: EqualPowerPannerNode (with very mathematically precise behavior) SoundFieldPannerNode (with very mathematically precise behavior) HRTFPannerNode Here's my take on the HRTF (spatialization data sets): The browser would be free to use any generic HRTF data set here. Browsers vary in many specific ways such as the exact fonts and anti-aliasing algorithms used to render text, the exact algorithms used to resample/re-scale images in both <img> and <canvas>, and the audio resampling algorithms used currently for <audio> , so I think it's OK to not specify the exact HRTF data set. We could have a method to optionally set a custom HRTF data set measured from a specific person. This would be an advanced and very rare use case I think and would require us to define a data format. In principle, I think having this method is fine, but I would defer it to a more advanced implementation. As long as we get the base API correct, then we can always add this method later. I think it would be a bad idea to always require the javascript developer to specify a specific URL for the HRTF data set, because these files are very large and would incur large download costs. Web pages today, have built-in default fonts for rendering text and don't require downloading fonts just to render text. Similarly, I think there should be a default HRTF (spatialization) data set which would automatically be used. > > ConvolverNode > The convolver node has an attribute that is an AudioBuffer. I think > it should just have a float array with the impulse response or > multiple float arrays if we want to convolve differently the different > channels. The fact of having the AudioBuffer could make the user > believe that the impulse response would adapt to different sample > rates, which doesn't seem to be the case. > Effectively, an AudioBuffer is a list of Float32Arrays (one for each channel). And I've recently added direct buffer access with the getChannelData() method, so javascript can generate/modify the buffers. I wouldn't worry about the sample-rate too much. We may be able to remove this attribute from AudioBuffer entirely if we can assume that the entire audio rendering graph is operating at the same sample-rate and all AudioBuffer objects also implicitly have this sample-rate. Otherwise, if we keep sampleRate, then we can define the behavior such that a sample-rate conversion automatically happens if necessary, or require that the sample-rate match the ConvolverNodes's sample-rate. > This is a quite important node because it will be used for many > different tasks. It's behavior should be clearly defined. I agree, and you're right in pointing out that we need to add more detail about the exact behavior. > Can the > user modify the impulse response on the fly (must the filter keep the > past N samples in memory for this)? There are some technical challenges to modifying the impulse response on the fly. The convolution is applied using FFT block-processing and in multiple-threads. This is a detail of implementation, but must be considered in actual practice, since direct convolution is very much less efficient and not feasible. Because the processing is block-based, it's possible to introduce glitches into the processed audio stream when modifiying the impulse-response in real-time. Some of these problems can be minimized by changing the impulse response slowly and only one block (time segment) at a time. The current state-of-the-art for desktop audio convolution engines does allow modifying the impulse responses in real-time with fancy user-interfaces. It would be interesting to be able to create these interfaces in canvas or WebGL! I would like to keep this possibility open, and make sure the API is flexible enough to add this feature. That said, I would also like the API to be fairly simple in the common use case, and wouldn't necessarily expect an initial implementation to have special engine support for glitch-free impulse response editing. Does the impulse response have a > limit in length? Should the user set the maximum length of the > impulse response at the beginning? > This is a great question. The longer the impulse response (and the more channels), the more CPU-intensive it becomes. A very very long impulse response that might work fine on a desktop machine, might have trouble on a mobile device. This is a scalability issue similar to what we already face with the graphics APIs. With WebGL, it's easily possible to draw way way too much stuff for either the javascript itself or the GPU to handle at anything near a reasonable frame-rate. > > RealtimeAnalyserNode > From my POV this node should be replaced by a FftNode. The FFT is not > only used for audio visualization but for many audio > analysis/processing/synthesis methods (transient detection, > coding/compression, transcription, pitch estimation, classification, > effects, etc.). Therefore I think the user should be able to have > access to a proper FFT, without smoothing, band processing nor > magnitude scaling (in dBs or in intensity). It should be also possible > to access the magnitude and phase or the complex values themselves, > many methods are based on the complex representation. Additionally I > would propose the possibility to select the window, frameSize, fftSize > and hopSize used when performing the FFT. I would also propose an > IfftNode that would perform the inverse of this one and the overlap > and add process to have to full loop and be able to go back to the > time domain. I will get back to this once I have the Chris webkit > branch running. The implementation of this addition should be trivial > since most FFT libraries also perform the IFFT. > The current RealtimeAnalyserNode API was quickly put together just to get basic visualizer support. Whatever we do, I hope this basic case can still be reasonably simple API-wise if we decide to go with a more elaborate approach. Believe me, I understand your interest in doing more by effectively creating a complete analysis and re-synthesis engine with arbitrary frequency-domain processing in between. A long time ago, in a previous life I worked at IRCAM on SVP (now SuperVP) and wrote the first version of AudioSculpt for doing exactly these types of transforms. For analysis, let's see what we can do API-wise to keep the simple cases simple, but allow for more sophisticated use cases later on. Like I said, the current API was very quickly designed, so maybe we can do much better. > > AudioParam > This one is a very tricky one. Currently parameters are only floats > and can have a minimum and maximum. This information is mostly useful > when automatically creating GUI for nodes or for introspection. But > finding a set of informations that can completely describe a parameter > space is extremely hard. I would say that the parameter should just > be a variant value with a description attribute that contains a > dictionary with some important stuff about the parameter. The > description could look somewhat like this (beware of my lack of > expertise in JS, there surely a better way): > gain parameter: {'type': 'float', 'min': 0, 'max': 1, 'default': 1, > 'units': 'intensity', 'description': 'Controls the gain of the > signal', 'name': 'gain'} > windowType parameter: {'type': 'enum', 'choices': [RECTANGULAR, HANN, > HAMMING, BLACKMANHARRIS], 'default': BLACKMANHARRIS, 'name': 'window', > 'description': 'The window function used before performing the FFT'} > > I think this would make it more flexible for future additions to the API. > I also think that the automation shouldn't belong in the AudioParam > class, since for some parameter it doesn't make sense to have it. The > user can easily perform the automation using JavaScript and since the > rate of parameter change (~ 100hz) is usually much lower than the > audio rate (~>8000Hz), there should be no problems with performance. > I designed the AudioParam API very much in the same way that I did for AudioUnits which are used as the plugin model for Mac OS X (and iOS). I think it has worked pretty well in a large variety of processing/synthesis plugins which are sold commercially. Although it's true that not everything can be represented by a float, most can be and it's useful to be able to attach automation curves to these types of objects for implementing envelopes, volume fades, etc. Almost all DAW (digital audio workstation) software has the concept of a timeline where different parameters can be automated in time. For the few cases which are not represented by floats, such as the "impulse response" of the ConvolverNode, it's not too difficult to have specific attributes on these objects (which are not AudioParams, and thus not automatable using a simple curve). I'm not sure that I agree that parameters can always easily be automated directly in javascript at a rate of (~ 100hz). Sometimes, parameter changes need to be scheduled to happen at relatively precise and rhythmically perfect ways. The resolution of javascript setTimeout() is not good enough for these cases, and is not reliable enough to guarantee that parameters change smoothly without glitches and pauses. As an example, SuperCollider has a control rate (krate) which defaults to 64 sample-frames which is (~ 1000Hz). The "automation" attribute of AudioParam is just speculation on my part of how the API would actually work. Soon, I hope to implement the automation directly in the underlying engine code in such a way that we can experiment with several different javascript API approaches. > Anyway these are just my 2 cents. I just had a first look at the API, > I might come up with more comments once I get my hands on Chris' > implementation and am able to try it out. > > ricard Thanks Ricard, I really appreciate your ideas and look forward to more discussions with you on refining the AudioNode approach. Cheers, Chris
Received on Friday, 2 July 2010 20:37:31 UTC