Re: DelayNode channel count changes from Ehsan Akhgari on 2013-09-04 (public-audio@w3.org from July to September 2013)

From: Ehsan Akhgari <ehsan.akhgari@gmail.com>
Date: Wed, 4 Sep 2013 11:08:11 -0400
To: Karl Tomlinson <karlt+public-audio@karlt.net>
Cc: "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <CANTur_5iOH_9sDpfp2bv3P8_1i8Km6Hrsq5NpTO+ZW2mb=SmZg@mail.gmail.com>
On Wed, Aug 28, 2013 at 7:12 PM, Karl Tomlinson <
karlt+public-audio@karlt.net> wrote:

> > On Sun, Aug 25, 2013 at 3:57 PM, Karl Tomlinson wrote:
>
> >> [...], taking the current wording
> >> literally "The number of channels of the output always equals the
> >> number of channels of the input" could lead to glitches as
> >> buffered streams are suddenly down-mixed because the input channel
> >> count changes.
>
> Ehsan Akhgari writes:
>
> > The glitching risk is not immediately obvious to me.
>
> Consider a system with stereo destination.  Initially a delay node
> has stereo input which is silent in the left channel but has a
> waveform in the right channel.  The delay node will buffer that
> stereo signal.  The input then changes to mono.  If the delay node
> output were concurrently changed to mono, then the output would
> need to be down-mixed.  The down-mixed output will be different
> from the stereo output, so there will be glitch at the point when
> output changes from stereo to mono.  Assuming
> channelInterpretation = "speakers", the left channel of the
> destination will abruptly start producing non-silent output,
> usually with a step function in the waveform.
>

OK, but is that also a problem with my suggestion below?


>  > Specifically, why is this only a problem for DelayNode?
>
> This same issue can happen if there are other nodes that have an
> output latency and a variable number of output channels.  Perhaps
> BiquadFilterNode is in the category, but the much greater delays
> in DelayNode mean there is more likely to be a glitch there.  A
> fade out on a stereo input before it is switched off will prevent
> glitches in BiquadFilterNode, but not in DelayNode with large
> delays.
>

PannerNode is probably affected by the same problem as well, since it uses
delay lines for the HRTF panning.


>  >> I think we need to allow the DelayNode to continue to produce a
> >> larger number of channels than its input, for at least some period.
> >>
> >
> > That doesn't seem to be possible to implement, since the delay time may
> not
> > be a multiple of 128, so the delay buffers may not be aligned to the
> block
> > boundaries.
>
> It is possible, but, yes, output channel count changes need to be
> on block boundaries.
>

Hmm, ok let's say that you have a delay node with a delay time equivalent
to 129 samples.  When you notice a channel count change on your input, you
produce the next block with the old channel count, and the next time around
you have one sample with the old channel count and 127 samples with the new
channel count.  I don't see how you can output a new block given those!


> >> Is it necessary to specifying exactly when a DelayNode should
> >> change its number of output channels, or can we leave this to the
> >> implementation?
> >
> > This needs to be specified, since this behavior is observable from web
> > content.
>
> OK.  I think we can do this.
>
> > As a strawman proposal, how about we handle the channel count changes in
> > discrete mode?  That way, the implementation can optimize away almost all
> > of the up/down-mixing work.
>
> I don't know exactly what you have in mind here.  By "discrete
> mode" are you meaning we change the channel count on block
> boundaries?  If so, then yes.
>

No, I meant the discrete mixing mode as defined by the spec, basically
dropping extra channels when down-mixing, and filling up extra channels
with silence when up-mixing.


> I'll describe the approach that I think makes the output count as
> close to expected as possible, and mostly minimizes up/down-mixing
> work:
>
>   "The output channel count for each output block is the maximum
>   channel count of its contributing recorded samples.  Samples are
>   up-mixed where necessary."
>
> The implementation details don't need to be spec'ed but it would
> record the input stream in its delay buffer and with that it would
> record the channel count that each part of the input stream had
> when it was being recorded.
>

Hmm, I'm not sure that is correct.  It seems like the above prose would
confuse the possible up-mixing that happens on the inputs before the
DelayNode "sees" the input, and the issue of changing the channel count.


> > One tricky thing to specify as well would be what should happen if you go
> > from channel count N to N-1 on one block and then back to N on the next?
> > Should the implementation hold the Nth delay buffer around or read from
> it,
> > or should the Nth channel on the second block be silent?
>
> The maximum, not minimum, channel count of the contributing
> samples needs to be used to avoid step functions from partial
> down-mixing as described above.  Up-mixing should not produce
> these kinds of glitches (unless different up-mixing rules were
> used in different parts of the graph).
>

What do you mean by the contributing samples exactly?  Again I don't
understand why we're only talking about up-mixing here, what if the number
of channels on the input decreases?


>
> > This sort of relates to the question I brought up above.  My instinct
> here
> > would be to drop the buffers as soon as the input channel count drops
> down.
>
> Not sure we still need to discuss this with the comments above,
> but the buffers need to be kept at least until the *output*
> channel drops.  I guess we could fade in a down-mix or up-mix
> with the fade beginning when the input channel count changes, but
> I think that would only add complexity.  We'd do more
> up/down-mixing and the results would only lose quality.
>
> Note that if an implementation allows the read pointer to move
> backwards (rate of change in delay > 1 - I got the sign wrong in
> the previous post), then it cannot simply drop channels from the
> buffer even when the output channel drops, because the read
> pointer may move back to where the buffered samples have a greater
> channel count.
>
>
Yeah, the read pointer can indeed move "back".  That is one of the reasons
why things get overly complicated since it's not always clear when the
channel count _should_ be reflected in the output, since the delay time is
not necessarily constant.

--
Ehsan
<http://ehsanakhgari.org/>
Received on Wednesday, 4 September 2013 15:09:20 UTC