Re: DelayNode channel count changes

> On Sun, Aug 25, 2013 at 3:57 PM, Karl Tomlinson wrote:

>> [...], taking the current wording
>> literally "The number of channels of the output always equals the
>> number of channels of the input" could lead to glitches as
>> buffered streams are suddenly down-mixed because the input channel
>> count changes.

Ehsan Akhgari writes:

> The glitching risk is not immediately obvious to me.

Consider a system with stereo destination.  Initially a delay node
has stereo input which is silent in the left channel but has a
waveform in the right channel.  The delay node will buffer that
stereo signal.  The input then changes to mono.  If the delay node
output were concurrently changed to mono, then the output would
need to be down-mixed.  The down-mixed output will be different
from the stereo output, so there will be glitch at the point when
output changes from stereo to mono.  Assuming
channelInterpretation = "speakers", the left channel of the
destination will abruptly start producing non-silent output,
usually with a step function in the waveform.

> Specifically, why is this only a problem for DelayNode?

This same issue can happen if there are other nodes that have an
output latency and a variable number of output channels.  Perhaps
BiquadFilterNode is in the category, but the much greater delays
in DelayNode mean there is more likely to be a glitch there.  A
fade out on a stereo input before it is switched off will prevent
glitches in BiquadFilterNode, but not in DelayNode with large
delays.

>> I think we need to allow the DelayNode to continue to produce a
>> larger number of channels than its input, for at least some period.
>>
>
> That doesn't seem to be possible to implement, since the delay time may not
> be a multiple of 128, so the delay buffers may not be aligned to the block
> boundaries.

It is possible, but, yes, output channel count changes need to be
on block boundaries.

>> Is it necessary to specifying exactly when a DelayNode should
>> change its number of output channels, or can we leave this to the
>> implementation?
>
> This needs to be specified, since this behavior is observable from web
> content.

OK.  I think we can do this.

> As a strawman proposal, how about we handle the channel count changes in
> discrete mode?  That way, the implementation can optimize away almost all
> of the up/down-mixing work.

I don't know exactly what you have in mind here.  By "discrete
mode" are you meaning we change the channel count on block
boundaries?  If so, then yes.

I'll describe the approach that I think makes the output count as
close to expected as possible, and mostly minimizes up/down-mixing
work:

  "The output channel count for each output block is the maximum
  channel count of its contributing recorded samples.  Samples are
  up-mixed where necessary."

The implementation details don't need to be spec'ed but it would
record the input stream in its delay buffer and with that it would
record the channel count that each part of the input stream had
when it was being recorded.

> One tricky thing to specify as well would be what should happen if you go
> from channel count N to N-1 on one block and then back to N on the next?
> Should the implementation hold the Nth delay buffer around or read from it,
> or should the Nth channel on the second block be silent?

The maximum, not minimum, channel count of the contributing
samples needs to be used to avoid step functions from partial
down-mixing as described above.  Up-mixing should not produce
these kinds of glitches (unless different up-mixing rules were
used in different parts of the graph).

> This sort of relates to the question I brought up above.  My instinct here
> would be to drop the buffers as soon as the input channel count drops down.

Not sure we still need to discuss this with the comments above,
but the buffers need to be kept at least until the *output*
channel drops.  I guess we could fade in a down-mix or up-mix
with the fade beginning when the input channel count changes, but
I think that would only add complexity.  We'd do more
up/down-mixing and the results would only lose quality.

Note that if an implementation allows the read pointer to move
backwards (rate of change in delay > 1 - I got the sign wrong in
the previous post), then it cannot simply drop channels from the
buffer even when the output channel drops, because the read
pointer may move back to where the buffered samples have a greater
channel count.

Received on Wednesday, 28 August 2013 23:14:01 UTC