- From: Ricard Marxer Piñón <ricardmp@gmail.com>
- Date: Thu, 1 Jul 2010 20:18:12 +0200
- To: Chris Rogers <crogers@google.com>
- Cc: Chris Marrin <cmarrin@apple.com>, Jer Noble <jer.noble@apple.com>, public-xg-audio@w3.org
Hi Chirs and others, I have been following this discussion for some time and I finally found a chance to contribute. First of all I like the idea of a graph based audio system. I believe it is the natural way of working with audio. As Chris said before, a minimal version of the standard can always boil down to an AudioSourceNode, AudioProcessingNode and AudioOutputNode. Where the audio processing node simply allows access and modification to each of block of samples in JavaScript language. It is also possible to easily create a JavaScript library that hides the complexity of the graph handling with negligible penalty on performance. I haven't have the chance to try out the implementation of the API yet, since I'm on a GNU/Linux system. But I do have some preliminary comments on the API proposal. AudioPannerNode + AudioListener: Maybe I'm wrong, but I think these nodes perform some processes that are quite tied to data (HRTF) or that may be implemented in many different ways that could lead to different outputs depending on the method. Maybe they could be broken up into smaller blocks that have a much more defined behavior and let the user of the API specify what data to use or what algorithm to implement. ConvolverNode The convolver node has an attribute that is an AudioBuffer. I think it should just have a float array with the impulse response or multiple float arrays if we want to convolve differently the different channels. The fact of having the AudioBuffer could make the user believe that the impulse response would adapt to different sample rates, which doesn't seem to be the case. This is a quite important node because it will be used for many different tasks. It's behavior should be clearly defined. Can the user modify the impulse response on the fly (must the filter keep the past N samples in memory for this)? Does the impulse response have a limit in length? Should the user set the maximum length of the impulse response at the beginning? RealtimeAnalyserNode >From my POV this node should be replaced by a FftNode. The FFT is not only used for audio visualization but for many audio analysis/processing/synthesis methods (transient detection, coding/compression, transcription, pitch estimation, classification, effects, etc.). Therefore I think the user should be able to have access to a proper FFT, without smoothing, band processing nor magnitude scaling (in dBs or in intensity). It should be also possible to access the magnitude and phase or the complex values themselves, many methods are based on the complex representation. Additionally I would propose the possibility to select the window, frameSize, fftSize and hopSize used when performing the FFT. I would also propose an IfftNode that would perform the inverse of this one and the overlap and add process to have to full loop and be able to go back to the time domain. I will get back to this once I have the Chris webkit branch running. The implementation of this addition should be trivial since most FFT libraries also perform the IFFT. AudioParam This one is a very tricky one. Currently parameters are only floats and can have a minimum and maximum. This information is mostly useful when automatically creating GUI for nodes or for introspection. But finding a set of informations that can completely describe a parameter space is extremely hard. I would say that the parameter should just be a variant value with a description attribute that contains a dictionary with some important stuff about the parameter. The description could look somewhat like this (beware of my lack of expertise in JS, there surely a better way): gain parameter: {'type': 'float', 'min': 0, 'max': 1, 'default': 1, 'units': 'intensity', 'description': 'Controls the gain of the signal', 'name': 'gain'} windowType parameter: {'type': 'enum', 'choices': [RECTANGULAR, HANN, HAMMING, BLACKMANHARRIS], 'default': BLACKMANHARRIS, 'name': 'window', 'description': 'The window function used before performing the FFT'} I think this would make it more flexible for future additions to the API. I also think that the automation shouldn't belong in the AudioParam class, since for some parameter it doesn't make sense to have it. The user can easily perform the automation using JavaScript and since the rate of parameter change (~ 100hz) is usually much lower than the audio rate (~>8000Hz), there should be no problems with performance. Anyway these are just my 2 cents. I just had a first look at the API, I might come up with more comments once I get my hands on Chris' implementation and am able to try it out. ricard On Wed, Jun 23, 2010 at 2:23 AM, Chris Rogers <crogers@google.com> wrote: > I have a pretty good idea how to make the optimizations, so we should be > good there. Conceptually, I think Jer's idea is the simplest and most > transparent. > On Tue, Jun 22, 2010 at 4:20 PM, Chris Marrin <cmarrin@apple.com> wrote: >> >> On Jun 21, 2010, at 4:47 PM, Jer Noble wrote: >> >> > >> > On Jun 21, 2010, at 3:27 PM, Chris Marrin wrote: >> > >> >> On Jun 21, 2010, at 2:34 PM, Chris Rogers wrote: >> >> >> >>> Hi Chris, >> >>> >> >>> I'm not sure we can also get rid of the AudioGainNode and integrate >> >>> the concept of gain directly into all AudioNodes. This is because with the >> >>> new model Jer is proposing we're connecting multiple outputs all to the same >> >>> input, so we still need a way to access the individual gain amounts for each >> >>> of the separate outputs. >> >> >> >> Right, but if every node can control its output gain, then you just >> >> control it there, right? So if you route 3 AudioSourceNodes into one >> >> AudioNode (that you're using as a mixer) then you control the gain of each >> >> channel in the AudioSourceNodes, plus the master gain in the AudioNode. For >> >> such a common function as gain, it seems like this would simplify things. >> >> The default gain would be 0db which would short circuit the gain stage to >> >> avoid any overhead. >> > >> > >> > Actually, I don't agree that modifying the output gain is so common an >> > operation that it deserves being promoted into AudioNode. Sure, it's going >> > to be common, but setting a specific gain on every node in a graph doesn't >> > seem very likely. How many nodes will likely have a gain set on them? >> > 1/2? 1/4? I'd be willing to bet that a given graph will usually have as >> > many gain operations as it has sources, and no more. >> > >> > I can also imagine a simple scenario where it makes things more >> > complicated instead of less: >> > >> > <PastedGraphic-1.tiff> >> > >> > In this scenario, there's no way to change the gain of the Source 1 -> >> > Reverb connection, independently of Source 2-> Reverb. To do it, you would >> > have to do the following: >> > >> > <PastedGraphic-3.pdf> >> > >> > And it seems very strange to have to create a generic AudioNode in order >> > to modify a gain. Alternatively, you could create multiple >> > AudioReverbNodes, but again, it seems weird to have to create multiple >> > reverb nodes just so you can change the gain going to only one of them.. >> > >> > Right now, every AudioNode subtype has a discreet operation which it >> > performs on its input, and passes to its output. To add in gain to every >> > AudioNode subtype would make things more confusing, not less. >> >> Ok, fair enough. My concern is that adding a gain stage will require extra >> buffering and extra passes through the samples. Do you think it will be >> practical for an implementation to optimize the gain calculation? For >> instance, I might have some software algorithm doing reverb. Since it's >> running through each sample, it would be easy for it to do a multiply while >> it's accessing the sample (either on the input or output side). If the >> reverb node knows it has a single input and that input is from a gain stage, >> it could do the gain calculation itself and avoid another pass through the >> data. >> >> As long as optimizations like that are possible, I think having a separate >> AudioGainNode is reasonable. >> >> ----- >> ~Chris >> cmarrin@apple.com >> >> >> >> >> > > -- ricard http://twitter.com/ricardmp http://www.ricardmarxer.com http://www.caligraft.com
Received on Friday, 2 July 2010 09:12:39 UTC