Re: Web Audio API Proposal from Ricard Marxer Piñón on 2010-07-01 (public-xg-audio@w3.org from July 2010)

From: Ricard Marxer Piñón <ricardmp@gmail.com>
Date: Thu, 1 Jul 2010 20:18:12 +0200
To: Chris Rogers <crogers@google.com>
Cc: Chris Marrin <cmarrin@apple.com>, Jer Noble <jer.noble@apple.com>, public-xg-audio@w3.org
Message-ID: <AANLkTik4rRc7anwOi1FcS2wvJalq5Ux9flUFnzi0x8-A@mail.gmail.com>
Hi Chirs and others,

I have been following this discussion for some time and I finally
found a chance to contribute.
First of all I like the idea of a graph based audio system.

I believe it is the natural way of working with audio.
As Chris said before, a minimal version of the standard can always
boil down to an AudioSourceNode,
AudioProcessingNode and AudioOutputNode.
Where the audio processing node simply allows access and modification
to each of block of samples in JavaScript language.
It is also possible to easily create a JavaScript library that hides
the complexity of the graph handling with negligible penalty on
performance.

I haven't have the chance to try out the implementation of the API
yet, since I'm on a GNU/Linux system.  But I do have some preliminary
comments on the API proposal.

AudioPannerNode + AudioListener:
Maybe I'm wrong, but I think these nodes perform some processes that
are quite tied to data (HRTF) or that may be implemented in many
different ways that could lead to different outputs depending on the
method. Maybe they could be broken up into smaller blocks that have a
much more defined behavior and let the user of the API specify what
data to use or what algorithm to implement.

ConvolverNode
The convolver node has an attribute that is an AudioBuffer.  I think
it should just have a float array with the impulse response or
multiple float arrays if we want to convolve differently the different
channels.  The fact of having the AudioBuffer could make the user
believe that the impulse response would adapt to different sample
rates, which doesn't seem to be the case.
This is a quite important node because it will be used for many
different tasks.  It's behavior should be clearly defined.  Can the
user modify the impulse response on the fly (must the filter keep the
past N samples in memory for this)?  Does the impulse response have a
limit in length?  Should the user set the maximum length of the
impulse response at the beginning?

RealtimeAnalyserNode
>From my POV this node should be replaced by a FftNode.  The FFT is not
only used for audio visualization but for many audio
analysis/processing/synthesis methods (transient detection,
coding/compression, transcription, pitch estimation, classification,
effects, etc.).  Therefore I think the user should be able to have
access to a proper FFT, without smoothing, band processing nor
magnitude scaling (in dBs or in intensity). It should be also possible
to access the magnitude and phase or the complex values themselves,
many methods are based on the complex representation.  Additionally I
would propose the possibility to select the window, frameSize, fftSize
and hopSize used when performing the FFT.  I would also propose an
IfftNode that would perform the inverse of this one and the overlap
and add process to have to full loop and be able to go back to the
time domain.  I will get back to this once I have the Chris webkit
branch running.  The implementation of this addition should be trivial
since most FFT libraries also perform the IFFT.

AudioParam
This one is a very tricky one.  Currently parameters are only floats
and can have a minimum and maximum.  This information is mostly useful
when automatically creating GUI for nodes or for introspection.  But
finding a set of informations that can completely describe a parameter
space is extremely hard.  I would say that the parameter should just
be a variant value with a description attribute that contains a
dictionary with some important stuff about the parameter.    The
description could look somewhat like this (beware of my lack of
expertise in JS, there surely a better way):
gain parameter: {'type': 'float', 'min': 0, 'max': 1, 'default': 1,
'units': 'intensity', 'description': 'Controls the gain of the
signal', 'name': 'gain'}
windowType parameter: {'type': 'enum', 'choices': [RECTANGULAR, HANN,
HAMMING, BLACKMANHARRIS], 'default': BLACKMANHARRIS, 'name': 'window',
'description': 'The window function used before performing the FFT'}

I think this would make it more flexible for future additions to the API.
I also think that the automation shouldn't belong in the AudioParam
class, since for some parameter it doesn't make sense to have it.  The
user can easily perform the automation using JavaScript and since the
rate of parameter change (~ 100hz) is usually much lower than the
audio rate (~>8000Hz), there should be no problems with performance.

Anyway these are just my 2 cents.  I just had a first look at the API,
I might come up with more comments once I get my hands on Chris'
implementation and am able to try it out.

ricard

On Wed, Jun 23, 2010 at 2:23 AM, Chris Rogers <crogers@google.com> wrote:
> I have a pretty good idea how to make the optimizations, so we should be
> good there.  Conceptually, I think Jer's idea is the simplest and most
> transparent.
> On Tue, Jun 22, 2010 at 4:20 PM, Chris Marrin <cmarrin@apple.com> wrote:
>>
>> On Jun 21, 2010, at 4:47 PM, Jer Noble wrote:
>>
>> >
>> > On Jun 21, 2010, at 3:27 PM, Chris Marrin wrote:
>> >
>> >> On Jun 21, 2010, at 2:34 PM, Chris Rogers wrote:
>> >>
>> >>> Hi Chris,
>> >>>
>> >>> I'm not sure we can also get rid of the AudioGainNode and integrate
>> >>> the concept of gain directly into all AudioNodes.  This is because with the
>> >>> new model Jer is proposing we're connecting multiple outputs all to the same
>> >>> input, so we still need a way to access the individual gain amounts for each
>> >>> of the separate outputs.
>> >>
>> >> Right, but if every node can control its output gain, then you just
>> >> control it there, right? So if you route 3 AudioSourceNodes into one
>> >> AudioNode (that you're using as a mixer) then you control the gain of each
>> >> channel in the AudioSourceNodes, plus the master gain in the AudioNode. For
>> >> such a common function as gain, it seems like this would simplify things.
>> >> The default gain would be 0db which would short circuit the gain stage to
>> >> avoid any overhead.
>> >
>> >
>> > Actually, I don't agree that modifying the output gain is so common an
>> > operation that it deserves being promoted into AudioNode.  Sure, it's going
>> > to be common, but setting a specific gain on every node in a graph doesn't
>> > seem very likely.   How many nodes will likely have a gain set on them?
>> >  1/2?  1/4?  I'd be willing to bet that a given graph will usually have as
>> > many gain operations as it has sources, and no more.
>> >
>> > I can also imagine a simple scenario where it makes things more
>> > complicated instead of less:
>> >
>> > <PastedGraphic-1.tiff>
>> >
>> > In this scenario, there's no way to change the gain of the Source 1 ->
>> > Reverb connection, independently of Source 2-> Reverb.  To do it, you would
>> > have to do the following:
>> >
>> > <PastedGraphic-3.pdf>
>> >
>> > And it seems very strange to have to create a generic AudioNode in order
>> > to modify a gain.  Alternatively, you could create multiple
>> > AudioReverbNodes, but again, it seems weird to have to create multiple
>> > reverb nodes just so you can change the gain going to only one of them..
>> >
>> > Right now, every AudioNode subtype has a discreet operation which it
>> > performs on its input, and passes to its output.  To add in gain to every
>> > AudioNode subtype would make things more confusing, not less.
>>
>> Ok, fair enough. My concern is that adding a gain stage will require extra
>> buffering and extra passes through the samples. Do you think it will be
>> practical for an implementation to optimize the gain calculation? For
>> instance, I might have some software algorithm doing reverb. Since it's
>> running through each sample, it would be easy for it to do a multiply while
>> it's accessing the sample (either on the input or output side). If the
>> reverb node knows it has a single input and that input is from a gain stage,
>> it could do the gain calculation itself and avoid another pass through the
>> data.
>>
>> As long as optimizations like that are possible, I think having a separate
>> AudioGainNode is reasonable.
>>
>> -----
>> ~Chris
>> cmarrin@apple.com
>>
>>
>>
>>
>>
>
>



-- 
ricard
http://twitter.com/ricardmp
http://www.ricardmarxer.com
http://www.caligraft.com
Received on Friday, 2 July 2010 09:12:39 UTC