Re: channel layouts and up/down mixing

On Fri, Jan 11, 2013 at 6:13 AM, Robert O'Callahan <robert@ocallahan.org>wrote:

> https://www.w3.org/Bugs/Public/show_bug.cgi?id=17379 has been open for
> about six months. It really needs to be fixed, since without a fix, the
> core functionality of mixing input nodes is undefined for all but a few
> combinations of possible input/output channel counts.
>
> The spec, and various comments made on the list, seem to suggest that for
> "nonstandard" channel counts (everything except 1, 2, 4, and 6 presumably?)
> the channel layouts could/should be left undefined/host-specific. But this
> is untenable, since it means applications don't know what the channels mean
> and UAs can't consistently upmix/downmix those channel counts.
>
> If we can't agree on standard layouts for 3, 5 or > 6 channels, I think it
> would be fine to just prohibit them for now. That constraint can be relaxed
> later in a future version of the spec.
>
> Rob
>

Hi Robert, there are cases where the channels are to be considered as part
of a speaker system, such as stereo, or 5.1.  But there are also many cases
in audio processing where there can be multiple channels not mapped to a
"layout" for a speaker system.  Here are two such examples:

1. If an AudioContext were to output 4 channels, two of which (channels
1,2) are the "main" stereo output mix, and channels 3,4 represent a stereo
heaphone/cue mix.  This is very typical in a DJ mixing scenario where the
DJ must be able to hear a different mix in the headphones compared with the
main mix.

2. Discrete mono tracks in multi-track recording.  For example, we might
have a stream with channels as follows:
1: bass drum
2: snare drum
3: hihat
4. lead guitar
5. rhythm guitar
6. vocals
7. ambient noises
8. synth

Kevin Ennis's mix.js demo is an example of this:
http://kevincennis.com/mix/

3. Ad-hoc speaker arrays.  Some sound installations and research centers
(CCRMA, etc.) have non-standard/custom speaker arrays.  For these types of
applications, we would simply like to output N channels which are connected
in somewhat arbitrary ways to speaker arrays.  The web application itself
would be written specifically to output the channels in such a way that
make sense for the particular application.

So we have a variety of situations when we're dealing with multiple
channels, many of which not relating to "speaker layouts".  But your
question is a good one about how to deal with up-mixing and down-mixing.  I
think that the current spec describes the common cases, but for other cases
we have to do something, and in the absence of any additional specific
"speaker layout" information we should up-mix and down-mix as follows:
* up-mixing N -> M channels should simply copy/pass-through the N channels,
and zero out the remaining (M - N) unused channels
* down-mixing N->M channels can copy the first M channels from the input
and discard the remaining (N - M) channels

In the Web Audio API, most of the operations are up-mixing operations.  For
example when multiple connections to an input occur, all the connections
are up-mixed to the highest number of channels.

There are a lot of details here, and I look forward to working through the
details of how the AudioContext can provide a .channelLayout attribute (if
the user agent detects the speakers to be configured in a specific way)...

Cheers,
Chris

Received on Friday, 11 January 2013 20:25:56 UTC