Re: Integer PCM sample formats to Web Audio API? from Marcus Geelnard on 2014-01-15 (public-audio@w3.org from January to March 2014)

From: Marcus Geelnard <mage@opera.com>
Date: Wed, 15 Jan 2014 10:58:13 +0100
To: kg@luminance.org
CC: Paul Adenot <padenot@mozilla.com>, Chris Wilson <cwilso@google.com>, Jukka Jylänki <jujjyl@gmail.com>, "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <52D65BB5.3080803@opera.com>
Hi again,

2014-01-14 22:28, K. Gadd skrev:
> On Fri, Jan 10, 2014 at 12:53 AM, Marcus Geelnard <mage@opera.com 
> <mailto:mage@opera.com>> wrote:
>
>
>     This is a slightly different issue, namely "What's the lowest
>     quality I can accept for this asset?". I can see a value in giving
>     the Web dev the possibility to indicate that a given asset can use
>     a low quality internal representation (the exact syntax for this
>     has to be worked out, of course). The situation is somewhat
>     similar to how 3D APIs allow developers to use compressed textures
>     when a slight quality degradation can be accepted. For audio, I
>     think that sounds such as noisy or muddy sound effects could
>     definitely use a lower quality internal representation in many
>     situations. The same could go for emulators that mainly use
>     8/16-bit low-sample-rate sounds.
>
>
> 22khz isn't 'low quality internal representation'; if the signal is 
> actually 22khz I don't know why you'd want to store it at higher 
> resolution. Lots of actual signals are at frequencies other than 48khz 
> for reasons like reproducing the sound of particular hardware or going 
> for a certain effect. (Also, isn't the mixer sampling rate for web 
> audio unspecified - i.e. it could be 48khz OR 44khz? given this, it 
> makes sense to let users provide buffers at their actual sampling rate 
> and be sure they will be stored that way.) The idea of handing Web 
> Audio a 22khz buffer, the implementation upsampling it to 48khz, and 
> then sampling it back down to 22khz for 22khz playback is... unfortunate.
>

Regarding sampler rates, the spec currently says "The decoding thread 
will take the result, representing the decoded linear PCM audio data, 
and resample it to the sample-rate of the AudioContext if it is 
different from the sample-rate of audioData. The final result (after 
possibly sample-rate converting) will be stored in an AudioBuffer.".

I'm not sure, but I believe that this is a performance & quality 
optimization aimed mainly at sound-effects and samples that are supposed 
to be played back within a single octave or so. The performance win for 
sound-effects that are played back at their original speeds is that you 
can skip interpolation and just do a straight linear memory copy/read. 
The quality win is that you can do better filtering in the resampling 
step than you would typically want to do in the playback stage.

I'm personally skeptical to this part of the spec, especially as it will 
generally increase memory usage. I would much rather see it as an 
implementation detail, and/or as a quality hint in the API. (Also, the 
specification does not seem to say anything about how the resampling is 
to be implemented - what algorithm should be used and what quality can 
be expected?).


>
>     Having such an option in the API gives the implementation an
>     opportunity to save memory when memory is scarce, but it's not
>     necessarily forced to do so.
>
>
> The whole point is to force the implementation to save memory. An 
> application that runs out of memory 80% of the time is not appreciably 
> better than one that does so 100% of the time - end users will 
> consider both unusable.
>

And here's where we think slightly differently. It should be up to the 
implementation to make the trade off, but the API should be designed in 
a way that does not prevent the implementation from doing these trade offs.

There's a huge difference in the HW capabilities on a mid-range phone 
and a reasonably equipped desktop computer - we're talking > 10s of 
orders of magnitude. Sometimes memory is the bottle neck. Sometimes the 
CPU is the bottle neck. Sometimes a more compact sample representation 
will yield higher performance (even if it adds conversion overhead). 
Sometimes a pre-converted buffer (though larger) will yield higher 
performance.

As Chris has already pointed out, it's quite risky to give too much 
explicit low-level control to the Web developer. Not because they are 
idiots, but because it's a very difficult and diverse problem to solve, 
and to make things worse implementations and platforms change over time 
- what's optimal in a certain browser one month might be less optimal 
the next month (a format conversion or interpolation routine might have 
been optimized for a specific CPU architecture, or the internal mixing 
pipeline or garbage collection mechanism might have been redesigned, etc).

/Marcus


-- 
Marcus Geelnard
Technical Lead, Mobile Infrastructure
Opera Software
Received on Wednesday, 15 January 2014 09:58:51 UTC