DelayNode channel count changes

I'd like to raise discussion of the desired behaviour for
DelayNodes when the input channel count changes.
There is a bug on file at [1].

It would be simple for implementations if they didn't have to
worry too much about this situation, and could forget an existing
delay buffer and start afresh when channel count changes.
However, channel count changes distant in the graph may, perhaps
unexpectedly, change the channel count on a delay node, so I think
we may have to make an effort to handle this.  Consider a graph
with mono-only sources.  If any stereo source is added, then most
downstream nodes switch to stereo input.

Is it expected that samples already received by a delay node
continue to be played after the channel count changes?

Assuming this is the expected behavior, taking the current wording
literally "The number of channels of the output always equals the
number of channels of the input" could lead to glitches as
buffered streams are suddenly down-mixed because the input channel
count changes.  I assume up-mixing formulas ensure we don't get
glitches when they are switched on, but there may not be much
point in up-mixing buffered samples until they need blending with a
larger number of channels.

I think we need to allow the DelayNode to continue to produce a
larger number of channels than its input, for at least some period.

Is it necessary to specifying exactly when a DelayNode should
change its number of output channels, or can we leave this to the
implementation?

  Exactly what this might be is unclear because of the variable
  delay value.

  If the y(t) = x(t - d(t)) delay model is used (see [2]), and
  rates of change in delay of < -1 are permitted, then any part of
  the buffer may be output at a future time, and so the output
  channel count shouldn't drop until maxDelayTime has elapsed
  after input channel count change.

  If rates of change in delay are limited by the implementation to
  be >= -1, then the output channel count can be changed when the
  read pointer passes the position the write pointer had when the
  channel count changed.  We can't be precise to the particular
  sample, as one output block per change may require some
  up-mixing to the maximum channel count of its buffered
  components.

As pointed out in [1], if a delay node keeps only one buffer and
the channel count changes, then there may be too much processing
required to up-mix the entire buffer at once.  A stereo delay
buffer, of the maximum three minute length, for a 48 kHz context,
may be 66 MB in size.

An alternative approach is to keep old buffers after channel count
changes until they are no longer required, and mix them together
for the output.  A downside of this approach is that we could
theoretically end up with as many buffers as the maximum numbers
of channels, 32 or more.  That is 32 * 31 / 2 channels, which is 
about 16 GB if they are 3 minute uncompressed buffers.

Another approach is to keep pointers to the positions in the
buffer when the channel count changed, and add channels only as
required.  Then a 3 minute 32 channel uncompressed buffer would
require only 1 GB ;).

[1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=21426
[2] http://lists.w3.org/Archives/Public/public-audio/2013JulSep/0568.html

Received on Sunday, 25 August 2013 19:58:32 UTC