Re: How to play back synthesized 22kHz audio in a glitch-free manner? from Joe Berkovitz on 2013-06-18 (public-audio@w3.org from April to June 2013)

From: Joe Berkovitz <joe@noteflight.com>
Date: Tue, 18 Jun 2013 09:55:06 -0400
To: Kevin Gadd <kevin.gadd@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>, Jukka Jylänki <jujjyl@gmail.com>, "public-audio@w3.org" <public-audio@w3.org>
Message-Id: <7F43CF51-FC0D-4AC3-9FF3-69E1580D6C97@noteflight.com>
On Jun 17, 2013, at 6:53 PM, Kevin Gadd <kevin.gadd@gmail.com> wrote:

> Joe, I agree that as specified it's possible for Start to be sample accurate. Is it actually, though?
> 

It seems to be sample accurate, at least on Chrome and Safari. Firefox may still have some bugs in scheduling andI have not tested a nightly build in a while, but I am quite sure there is a shared understanding on this point.

The glitches I heard in your test seemed to me to be small enough on chrome that they were probably just due to resampling errors at a buffer boundary, not bad scheduling.

Here's another way to think about it: what if a resampled buffer could be started with *sub*-sample accuracy (which is analogous to what happens with looping). That could also solve your problem of seaming these sounds together.

> That is: If arguments like when are specified as non-integral seconds, and JS's only number type is Double, can we actually be sure that sample accurate scheduling will continue to work as an app runs? Has it been exhaustively tested?
> 

While I haven't tested this exhaustively I would expect it to keep on working and I don't see how the precision would drop low enough over time to cause a problem, there would still be plenty of mantissa precision left over even in very long game play.

Actually, as co-editor of the use case document I am very interested in understanding why the arbitrary concatenation of buffers is important. When would this technique be used by a game? Is this for stitching together prerecorded backgrounds?

Although its very clear this is some kind of spec gap, it's much easier to assess priorities and potential remedies if we understand why the feature is important and exactly how it might be used.

...joe

> Floating point time representations have historically been a problem in games code, not just due to insufficient precision but because the precision can vary over time. For examples, see http://randomascii.wordpress.com/2012/02/13/dont-store-that-in-a-float/ - in this particular case, the evidence supports that 'float' is insufficient but 'double' is usually sufficient - however, there are still artifacts one can observe caused by variable precision. I would worry that these artifacts could easily creep in depending on exactly how the compiler generates the floating point arithmetic for Web Audio's native implementation, and what the JavaScript JIT happens to do with its floating point arithmetic. And what happens if the JIT decides that certain values are integers instead of doubles, and we lose tiny fractional amounts of precision here and there?
> 
> Were we storing actual integral sample offsets in double, we know that's got ~53 bits of integer precision, which (based on my layman's estimate) is more than enough to represent a long-running application's sample offsets. The question is what happens when we try to represent a long-running application's sample offsets as doubles, and whether the arithmetic breaks down (or worse, produces inconsistent results) over time.
> 
> This kind of circles back to one of the complaints I voiced previously: There are places in this API where things are specified in 'seconds', and it is unclear how this interacts with mechanisms like looping and sample rate adjustment. 'samples' is arguably a much clearer unit in such cases because it is an objective measure of the individual samples in an input or output buffer instead of a measurement of subjective time relative to some arbitrary measurement point. Of course, then users have to think in samples... that's not always the greatest thing.
> 
> I ask all this because multiple developers have voiced to me their difficulties doing sample-accurate scheduling with Web Audio, and when I look at the API, I am not certain I could do it either. If it's possible to make it simpler and easier to do sample accurate scheduling, it might be worthwhile to do something here, even if backwards compatibility stops us from (for example) changing the units used by arguments like when.
> 
> -kg
> 
> 
> On Mon, Jun 17, 2013 at 3:32 PM, Joe Berkovitz <joe@noteflight.com> wrote:
>> Start() is already defined as sample accurate. I think the main issue here is the stitching together of resampled buffers.
>> 
>> I'd like to point out that looping of resampled buffers with variable sample rates is glitch free and it seems reasonable that general concatenation should work at least as well as looping.
>> 
>> Note that if you have resampling broken out, this creates difficulties regarding the way time units and sample rates operate upstream from the resampler node. This problem of mixing sample rates in the same audio context has come up before on the list and it was dismissed, I think. I don't have a cite handy for that discussion. 
>> 
>> .            .       .    .  . ...Joe
>> 
>> Joe Berkovitz
>> President
>> Noteflight LLC
>> +1 978 314 6271
>> www.noteflight.com
>> "Your music, everywhere."
>> 
>> On Jun 17, 2013, at 6:15 PM, Kevin Gadd <kevin.gadd@gmail.com> wrote:
>> 
>>> Could one simply define a ResamplerNode/PlaybackRateAdjustmentNode? Then, in cases where you want to stitch together smaller buffers and adjust the playback rate of all of them, you give them all the resampler node as a shared destination.
>>> 
>>> This would allow removing the .playbackRate attribute of AudioBufferSourceNode entirely, and it would probably be more generally useful anyway - for example, resampling ScriptProcessorNode outputs entirely, adjusting the playback rate of audio from an <audio> element, etc. I'd argue that such a change would have a good symmetry with the removal of .gain and provide benefits for developers.
>>> 
>>> Separate from this, though, we still ultimately need a way to schedule buffers in a sample-precise manner - whether it's changes to the definition of start()/etc in order to enable sample-precise start times, or a startImmediatelyAfter method. But splitting playback rate adjustment out would at least let people realistically use ScriptProcessorNode in these scenarios, which would be great!
>>> 
>>> -kg
>>> 
>>> 
>>> On Mon, Jun 17, 2013 at 2:36 PM, Robert O'Callahan <robert@ocallahan.org> wrote:
>>>> On Tue, Jun 18, 2013 at 7:25 AM, Jukka Jylänki <jujjyl@gmail.com> wrote:
>>>>> If the Web Audio API had an explicit support for buffer queueing/stitching with AudioBufferSourceNodes, and the user could give that contract to the Web Audio impl with the 'startImmediatelyAfter' function, then the implementation could perform audio resampling on the stream as a whole, and not to discontinuous source nodes individually.
>>>> 
>>>> Only if they have the same set of destinations. I suppose that could be done but it's not trivial. Then again, it would solve use cases for which ScriptProcessorNode is not a very good fit.
>>>> 
>>>> Rob
>>>> -- 
>>>> Jtehsauts tshaei dS,o n" Wohfy Mdaon yhoaus eanuttehrotraiitny eovni le atrhtohu gthot sf oirng iyvoeu rs ihnesa.r"t sS?o Whhei csha iids teoa stiheer :p atroa lsyazye,d 'mYaonu,r "sGients uapr,e tfaokreg iyvoeunr, 'm aotr atnod sgaoy ,h o'mGee.t" uTph eann dt hwea lmka'n? gBoutt uIp waanndt wyeonut thoo mken.o w
>
Received on Tuesday, 18 June 2013 13:55:34 UTC