- From: Ehsan Akhgari <ehsan.akhgari@gmail.com>
- Date: Wed, 28 Aug 2013 12:15:23 -0400
- To: Karl Tomlinson <karlt+public-audio@karlt.net>
- Cc: "public-audio@w3.org" <public-audio@w3.org>
- Message-ID: <CANTur_7hZRoBWKGLF8hGFcGFR+APq6pSSYzNBe8xHGJrMey=pA@mail.gmail.com>
On Sun, Aug 25, 2013 at 3:57 PM, Karl Tomlinson < karlt+public-audio@karlt.net> wrote: > I'd like to raise discussion of the desired behaviour for > DelayNodes when the input channel count changes. > There is a bug on file at [1]. > > It would be simple for implementations if they didn't have to > worry too much about this situation, and could forget an existing > delay buffer and start afresh when channel count changes. > However, channel count changes distant in the graph may, perhaps > unexpectedly, change the channel count on a delay node, so I think > we may have to make an effort to handle this. Consider a graph > with mono-only sources. If any stereo source is added, then most > downstream nodes switch to stereo input. > > Is it expected that samples already received by a delay node > continue to be played after the channel count changes? > Yes, I think so. > Assuming this is the expected behavior, taking the current wording > literally "The number of channels of the output always equals the > number of channels of the input" could lead to glitches as > buffered streams are suddenly down-mixed because the input channel > count changes. I assume up-mixing formulas ensure we don't get > glitches when they are switched on, but there may not be much > point in up-mixing buffered samples until they need blending with a > larger number of channels. > The glitching risk is not immediately obvious to me. Specifically, why is this only a problem for DelayNode? > I think we need to allow the DelayNode to continue to produce a > larger number of channels than its input, for at least some period. > That doesn't seem to be possible to implement, since the delay time may not be a multiple of 128, so the delay buffers may not be aligned to the block boundaries. > Is it necessary to specifying exactly when a DelayNode should > change its number of output channels, or can we leave this to the > implementation? > This needs to be specified, since this behavior is observable from web content. > Exactly what this might be is unclear because of the variable > delay value. > > If the y(t) = x(t - d(t)) delay model is used (see [2]), and > rates of change in delay of < -1 are permitted, then any part of > the buffer may be output at a future time, and so the output > channel count shouldn't drop until maxDelayTime has elapsed > after input channel count change. > > If rates of change in delay are limited by the implementation to > be >= -1, then the output channel count can be changed when the > read pointer passes the position the write pointer had when the > channel count changed. We can't be precise to the particular > sample, as one output block per change may require some > up-mixing to the maximum channel count of its buffered > components. > > As pointed out in [1], if a delay node keeps only one buffer and > the channel count changes, then there may be too much processing > required to up-mix the entire buffer at once. A stereo delay > buffer, of the maximum three minute length, for a 48 kHz context, > may be 66 MB in size. > As a strawman proposal, how about we handle the channel count changes in discrete mode? That way, the implementation can optimize away almost all of the up/down-mixing work. One tricky thing to specify as well would be what should happen if you go from channel count N to N-1 on one block and then back to N on the next? Should the implementation hold the Nth delay buffer around or read from it, or should the Nth channel on the second block be silent? > An alternative approach is to keep old buffers after channel count > changes until they are no longer required, and mix them together > for the output. A downside of this approach is that we could > theoretically end up with as many buffers as the maximum numbers > of channels, 32 or more. That is 32 * 31 / 2 channels, which is > about 16 GB if they are 3 minute uncompressed buffers. > This sort of relates to the question I brought up above. My instinct here would be to drop the buffers as soon as the input channel count drops down. > Another approach is to keep pointers to the positions in the > buffer when the channel count changed, and add channels only as > required. Then a 3 minute 32 channel uncompressed buffer would > require only 1 GB ;). > Before discussing fancier proposals than my strawman, I'd like to understand why that simplistic approach would not be enough. Cheers, -- Ehsan <http://ehsanakhgari.org/>
Received on Wednesday, 28 August 2013 16:16:34 UTC