Re: Standardizing audio "level 1" features first? from Chris Rogers on 2012-02-11 (public-audio@w3.org from January to March 2012)

From: Chris Rogers <crogers@google.com>
Date: Sat, 11 Feb 2012 11:24:28 -0800
To: Michael Schöffler <michael.schoeffler@audiolabs-erlangen.de>
Cc: public-audio@w3.org
Message-ID: <CA+EzO0nQRZ4z1xTs1nfm9-7J_p0MNMuYQdTuyFi-nMgzKb727g@mail.gmail.com>
Hi Michael, thanks for your comments.  I really appreciate your feedback!

On Fri, Feb 10, 2012 at 4:53 AM, Michael Schöffler <
michael.schoeffler@audiolabs-erlangen.de> wrote:

> Hello everyone,
>
> my name is Michael Schoeffler and I'm a Ph.D. student at the AudioLabs
> Erlangen. I'm currently working on a framework for signal processing
> plugins. The idea is to have something similar to VST or Audio Units, but
> fully web-based. The chances are not bad, that some of my co-workers will
> also use this framework for developing their plugins. So maybe I'll be able
> to provide a lot of feedback to this group in the near future :)
>
> On topic:
> When I started developing with the Web Audio Data, my first thought was
> "Great API, but it seems to be too high level" and my opinion hasn't
> changed yet. For example the multichannel systems handling is for my
> use-cases too high-level.


I'd be interested in hearing more details about what limitations you're
seeing in the current proposal.  Aside from the panning system and speaker
layouts you mention below, the API offers you direct access to every single
channel with the AudioChannelSplitter and AudioChannelMerger.  Please note
my comment:
"note: this upper limit of 6 is arbitrary and could be increased to support
7.2, and higher"


> As I understand the API focuses on the "mainstream" systems like Mono,
> Stereo and 5.1 and does an automatically up/downmixing on the connection
> between two AudioNodes.


The up-mixing is very important.  It's not that uncommon for games and
interactive applications to load a mixture of mono and stereo audio assets
which need to be mixed and processed together seamlessly.  The developer
doesn't need to be bothered with the channel details, and so the "right
thing just happens".  If we didn't do this, then the developer wouldn't
have nearly as convenient of a system and wouldn't be able to connect
multiple heterogeneous sources to a filter for processing.  Instead, the
developer would have to worry about checking individual sources, twiddling
channels, creating multiple low-level processing modules for each channel.
 In short, the developer would have to manage many more low-level details
than is currently necessary with the API.  But that doesn't mean that we
sacrifice low-level control.  If the developer wants, multi-channel sources
can be broken down into component channels with individual processing on
each channel, etc.  But we don't *force* developers to work at that level.




> In the source code of Google Chrome I found many terms related to this
> three "mainstream" system. But I think other systems getting more
> important. A lot of research is done in rendering 22.2 to 5.1, 7.1 to
> stereo, 5.1 to stereo and so on. So even huge multichannel systems could be
> relevant for mobile devices some time.
>

That's great!  And I don't see why we can't support them.  Please see my
comments above about being able to break down multi-channel sources into
component channels.  Please note that the default down-mix code would not
be forced on anybody wishing to expert a finer level of control.  For
example, in a custom and specialized 5.1 -> stereo
downmix, AudioChannelSplitter can be used to access each individual channel
and perform arbitrary processing to render stereo.  The AudioGainNode,
AudioPannerNode, BiquadFilterNode, and ConvolverNodes can be used to
implement quite a rich set of down-mixing algorithms.  And, if that's not
enough, then a JavaScriptAudioNode could be used.


> Another example would be spatialization. It is directly integrated in the
> API. There are tons of approaches how convolution could be implemented: In
> time-domain, frequency-domain, uniformly, non-uniformly, combined with
> psychoacoustics hints,.... Each approach has its advantages and
> disadvantages.


I think you're mixing up concepts a little bit here.  Implementing
convolution with time-domain or frequency-domain algorithms is entirely an
implementation detail, and does not affect the API.  For example, in the
ConvolverNode, an impulse response is given and the node is expected to
perform the convolution which is a mathematical precise operation.  Yes,
internally it could be using time-domain or frequency-domain algorithms,
but that doesn't change how the API appears to the developer. Convolution
is a *very* widely used technique, proving its usefulness in all kinds of
real-world audio processing applications:
* motion major picture production
* games audio
* music audio production

But, convolution is different from the general term "spatialization".  The
AudioPannerNode implements spatialization, and supports more than one
algorithm:

partial interface AudioPannerNode {
        // Panning model
        const unsigned short EQUALPOWER = 0;
        const unsigned short HRTF = 1;
        const unsigned short SOUNDFIELD = 2;
}

These algorithms are very useful, especially for games and interactive
applications.  But the API is certainly not locked into these models and
could be extended with additional ones.  It would be great to hear from you
if there's a commonly used model that we're missing here.  But even if we
miss some in the beginning, the API is extensible with additional constants.


For myself, I would not use the API function. I would build a library that
> offers all the approaches. Performance is maybe a problem, but I would rely
> e.g. on the WebCL WG, so that the performance argument doesn't count
> anymore.
>

Good luck with WebCL!  It *may* one day become a standard, but that doesn't
appear to be the case anytime soon.  I haven't even seen prototypes of
useful high-performance  audio systems built with WebCL, and don't believe
it will be a good fit for for developing general purpose, high-quality and
performant audio processing.


>
> Nonetheless the Web Audio API is already very useable for me. So thanks to
> the guys that worked hard on it so far.
>

Thanks Michael, it was my intention to make it very useable and practical
for real-world applications now!  And I'm hearing good things from music
and game developers who are using it today.

Regards,
Chris



>
> The idea of a "Level 1" specification sounds for me very interesting.
>
> Best Regards,
>
> Michael
>
>
>
>
> --
> Michael Schoeffler, M.Sc.
>
> International Audio Laboratories Erlangen (AudioLabs)
> University of Erlangen-Nuremberg & Fraunhofer IIS, Audio & Multimedia
> Am Wolfsmantel 33
> 91058 Erlangen
> Germany
>
> Tel.: +49 9131 85-20515
> Skype: michael.schoeffler
> michael.schoeffler@audiolabs-erlangen.de
> http://www.audiolabs-erlangen.de/
>
>
>
>
>
>
Received on Saturday, 11 February 2012 19:24:58 UTC