Re: Reconciling ConvolverNode's output channel dependencies with the mixing rules in the spec from Frederick Umminger on 2013-05-18 (public-audio@w3.org from April to June 2013)

From: Frederick Umminger <frederick.umminger@gmail.com>
Date: Fri, 17 May 2013 22:15:53 -0700
To: Chris Rogers <crogers@google.com>
Cc: Ehsan Akhgari <ehsan.akhgari@gmail.com>, "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <CAPJnUh_Nczt3kTYThgiERtw=ZvE9QPocpHWhJdySg9Tw+uo-oQ@mail.gmail.com>

Hi Chris,

On Fri, May 17, 2013 at 12:01 PM, Chris Rogers <crogers@google.com> wrote:
>
>
> Most applications for convolution are for reverb which is most commonly
> stereo.  That's why this node "comes out of the box" as stereo.  We can
> certainly handle the general cases too.
>

I guess this depends on what area of audio you are working in. For film or
 triple-A games I am not confident that stereo is the common case. It
bothers me a great deal that the API is currently very stereo-centric.

> This is a bit limiting and inefficient in the common case of N=2, M=2
> where we wish K=2.  The cases "Normal Stereo" and "True Stereo" are both
> valid.
>
>
I disagree that the case N=M=K=2 is either common or valid. In most real
stereo reverbs there is cross-talk between the 2 channels ("true stereo") -
a typical stereo reverb is not just two mono reverbs in parallel.  In any
case, two mono reverbs in parallel can be handled as two parallel
ConvolverNodes, which is clear and should be easy to do with the API.

Anyway, as I understand the spec as written, the behavior of the
ConvolerNode is specified and required for N,M =1,2, K = 1,2,4, but it is
left open as a possibility that other values may be supported ("In the
general case the source has N input channels, the impulse response has K
channels, and the playback system has M output channels."). However, the
behavior that occurs with other values of N,M and K is not specified. That
is asking for trouble. If other values of N,M and K are allowed, then the
behavior needs to be precisely specified.

There is a pretty large literature on multidimensional signal processing
that always treats the transfer function from N inputs to M outputs as an
N*M dimensional matrix (of functions of z). Doing anything else is a
neologism inconsistent with the signal-processing literature. It makes the
API harder to understand because prior experience and the wealth of
pre-existing documentation does not apply.

If N,M and K are arbitrary, then for most values with K != N*M there is no
reasonable behavior. It may be convenient to have a special override for
N=M=K=2, which falls outside of a general rule and requires special
documentation, but what should happen when N=5, M=3, K=11?

If N,M,K are not arbitrary, but are restricted to N,M =1,2, K = 1,2,4 or N
=1,2, M=2, K = 1,2,4, then that is a wasted opportunity to generalize
something that is easily generalizable, very useful and has a large
literature. It is absolutely the case that people doing surround are going
to want N=1,M=K=5,6,7,8.

It looks to me that in order to implement a fully general N to M channel
convolution I would need a ChannelSplitterNode to split the N channel input
to N mono channels, N*M mono ConvolverNodes (or N*M hardcoded stereo
ConvolverNodes and downmixed back to mono), M summing junctions to mono
channels, and then a ChannelMergeNode to combine back to an M channel
output. It sure would be nice to just use a single ConvolverNode instead.

Sincerely,
   Frederick

Received on Saturday, 18 May 2013 05:16:19 UTC