Re: DelayNode channel count changes from Ehsan Akhgari on 2013-08-28 (public-audio@w3.org from July to September 2013)

From: Ehsan Akhgari <ehsan.akhgari@gmail.com>
Date: Wed, 28 Aug 2013 12:15:23 -0400
To: Karl Tomlinson <karlt+public-audio@karlt.net>
Cc: "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <CANTur_7hZRoBWKGLF8hGFcGFR+APq6pSSYzNBe8xHGJrMey=pA@mail.gmail.com>
On Sun, Aug 25, 2013 at 3:57 PM, Karl Tomlinson <
karlt+public-audio@karlt.net> wrote:

> I'd like to raise discussion of the desired behaviour for
> DelayNodes when the input channel count changes.
> There is a bug on file at [1].
>
> It would be simple for implementations if they didn't have to
> worry too much about this situation, and could forget an existing
> delay buffer and start afresh when channel count changes.
> However, channel count changes distant in the graph may, perhaps
> unexpectedly, change the channel count on a delay node, so I think
> we may have to make an effort to handle this.  Consider a graph
> with mono-only sources.  If any stereo source is added, then most
> downstream nodes switch to stereo input.
>
> Is it expected that samples already received by a delay node
> continue to be played after the channel count changes?
>

Yes, I think so.


> Assuming this is the expected behavior, taking the current wording
> literally "The number of channels of the output always equals the
> number of channels of the input" could lead to glitches as
> buffered streams are suddenly down-mixed because the input channel
> count changes.  I assume up-mixing formulas ensure we don't get
> glitches when they are switched on, but there may not be much
> point in up-mixing buffered samples until they need blending with a
> larger number of channels.
>

The glitching risk is not immediately obvious to me.  Specifically, why is
this only a problem for DelayNode?


> I think we need to allow the DelayNode to continue to produce a
> larger number of channels than its input, for at least some period.
>

That doesn't seem to be possible to implement, since the delay time may not
be a multiple of 128, so the delay buffers may not be aligned to the block
boundaries.


> Is it necessary to specifying exactly when a DelayNode should
> change its number of output channels, or can we leave this to the
> implementation?
>

This needs to be specified, since this behavior is observable from web
content.


>   Exactly what this might be is unclear because of the variable
>   delay value.
>
>   If the y(t) = x(t - d(t)) delay model is used (see [2]), and
>   rates of change in delay of < -1 are permitted, then any part of
>   the buffer may be output at a future time, and so the output
>   channel count shouldn't drop until maxDelayTime has elapsed
>   after input channel count change.
>
>   If rates of change in delay are limited by the implementation to
>   be >= -1, then the output channel count can be changed when the
>   read pointer passes the position the write pointer had when the
>   channel count changed.  We can't be precise to the particular
>   sample, as one output block per change may require some
>   up-mixing to the maximum channel count of its buffered
>   components.
>
> As pointed out in [1], if a delay node keeps only one buffer and
> the channel count changes, then there may be too much processing
> required to up-mix the entire buffer at once.  A stereo delay
> buffer, of the maximum three minute length, for a 48 kHz context,
> may be 66 MB in size.
>

As a strawman proposal, how about we handle the channel count changes in
discrete mode?  That way, the implementation can optimize away almost all
of the up/down-mixing work.

One tricky thing to specify as well would be what should happen if you go
from channel count N to N-1 on one block and then back to N on the next?
Should the implementation hold the Nth delay buffer around or read from it,
or should the Nth channel on the second block be silent?


> An alternative approach is to keep old buffers after channel count
> changes until they are no longer required, and mix them together
> for the output.  A downside of this approach is that we could
> theoretically end up with as many buffers as the maximum numbers
> of channels, 32 or more.  That is 32 * 31 / 2 channels, which is
> about 16 GB if they are 3 minute uncompressed buffers.
>

This sort of relates to the question I brought up above.  My instinct here
would be to drop the buffers as soon as the input channel count drops down.


> Another approach is to keep pointers to the positions in the
> buffer when the channel count changed, and add channels only as
> required.  Then a 3 minute 32 channel uncompressed buffer would
> require only 1 GB ;).
>

Before discussing fancier proposals than my strawman, I'd like to
understand why that simplistic approach would not be enough.

Cheers,
--
Ehsan
<http://ehsanakhgari.org/>
Received on Wednesday, 28 August 2013 16:16:34 UTC