Re: Integer PCM sample formats to Web Audio API? from K. Gadd on 2014-01-18 (public-audio@w3.org from January to March 2014)

From: K. Gadd <kg@luminance.org>
Date: Fri, 17 Jan 2014 20:24:23 -0800
To: Chris Wilson <cwilso@google.com>
Cc: "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <CAPJwq3XTgEwT2p+CebMm9tZgNUkKZwKNXK51hUjichU8tu9bRQ@mail.gmail.com>
I don't understand how 'the web platform' is a meaningful response when I
ask about platform constraints and how they're being optimized for. 'the
web platform' is not a platform you can optimize for; it is myriad browsers
atop myriad operating systems on myriad devices. They don't share any
particular constraints or preferred formats or available memory or
bandwidth constraints. It is utterly ridiculous to claim that locking
sample buffers to float32 is 'optimizing for the web platform'. Scalable
across all devices is, in this case, effectively as good as saying we have
no optimization target (unless you mean 'scalable' in the sense 'this
application's out-of-memory dialog box will run on a variety of
platforms'). I'm thinking of things in the vein of 'web audio
games/applications should perform adequately on a Nexus 4 as well as on a
circa-2013 Macbook Air', which is a very precise and easily determined
thing - i.e. has this API designed to be efficient on those two
architectures, does it cope with the resource constraints of each, etc.
These goals could be expanded more precisely by saying 'we want *these test
cases* to perform acceptably *on these platforms*', but I am not going to
ask for something so complex.

P.S. JS numbers are 64-bit floats, not 32-bit floats. I don't understand
the claim that float32 is used because of javascript runtimes. Are
float32->float64 widening conversions free on all the platforms you care
about? (IIRC they are basically free on x86, but that's just because its
internal float precision is much higher, and i'm not certain those
conversions are ACTUALLY free there in any case, just effectively free...
For reference, many ARM chips do not have hardware floating-point support,
or it is provided by a co-processor, or it comes at increased instruction
latencies. Are we 'optimizing' for those low-end chips as well by using
floats everywhere?

It's impossible for anyone - myself included - to actually provide the
measurements and data you want when you won't give us details on what you
want us to measure. Is it okay to assume the presence of SIMD? What about
out-of-order? What kind of memory bandwidth can we assume? If memory
bandwidth is effectively unconstrained, the cost of larger samples is
minimized, while on bandwidth-constrained devices (I.E. most mobile
devices) doubling the size of all your buffers can be significant. What
performance requirements must be met for an int16 buffer implementation to
actually pass muster? As long as the requirements are unstated, the
necessary effort (to implement int16 buffers) to benchmark might seem
rather wasted if in the end someone will move the goalposts and say 'yes,
very good, but it must be faster than that' or 'it must optimize for this
one other case too'.

The sample scenarios you cite in the document are great for understanding
the *intent* of the API, but do nothing to tell us why float32 was chosen
as a standard for everything and why it is optimal vs int16 buffers, or as
one random example, why JS's native float64 type was not used for samples.
What I'm talking about is hardware constraints and platform constraints,
since when you're talking about performance optimization, understanding the
hardware/platform is essential if you want to actually deliver good
performance. Maybe I have been misunderstanding you this whole time and you
have been saying that float32 was chosen based on applications, not based
on hardware/platform - but in that case it still makes little sense to me,
because I don't see how that sample format choice improves on any of those
applications either. If anything, just like the real-world apps that are
impaired by float32 buffers' memory demands, some of the hypothetical
applications in the web audio document CLEARLY call for smaller,
memory-efficient buffers.

Maybe I'm just missing something obvious here; from my perspective there
are shipped applications/games out there using web audio that end up having
to dedicate a lot of their memory to float32 buffers. int16 buffers would
greatly relieve that memory pressure, which has obvious, trivially provable
benefits (OOM -> not OOM). I have never argued that conversion would have
no impact - I have in fact made it clear that I simply believe a
performance hit (when the user opts in to int16 buffers) is better than the
application never working at all. I don't see how an opt-in performance hit
that enables use cases is a bad thing, though perhaps your fears that
developers will misuse it justify the concern.


Anyway, re your excellent points on things like getChannelData:

Arguably if getChannelData was specced to always yield a float32 array, you
have no choice but to keep that behavior. It's kind of awful, but I'm sure
apps out there already depend on it and will be confused if some other part
of the pipeline (i.e. a third-party audio library) opting into efficient
buffers changes the format of the values they get out of Web Audio. You
would probably have to introduce a new entry point that exposes the
*actual* samples contained within the buffer. This also implies that
getChannelData has to provide a read-write float32 'proxy' for the actual
samples, which is... not great. Maybe it's better to kill getChannelData
entirely on non-float buffers, so that people know they need to use the
alternate entry point if they want to manipulate samples. (Maybe a better
approach is to introduce a new optional argument, defaulting to false, that
says 'I know I'm not getting a float32 array back from this call'. Yuck,
though.) This is much simpler in the rendering world because D3D and OpenGL
both expose APIs for buffer access that don't imply that the format you
pass to/from the driver is the format on the GPU (noted exception: D3D used
to encourage mapping a texture's bytes into your address space r/w; this is
not encouraged anymore because it defeats all sorts of interesting
optimizations.)

IMO the formats you should support for internal buffers are the formats you
support for buffer input already - if it can be decoded/uploaded to a
buffer right now using Web Audio, it should be possible to store that in an
AudioBuffer, within reason, regardless of the bitness and sample rate. I
can see how this would get hairy if you currently support oddball
bitnesses, though, since that makes it impossible to represent them as a
typed array.

I agree that egregious sample rate conversion poses more of a problem for
memory use than sample bitness. I was not aware that web audio
implementations did offline resampling at load time and am quite dismayed
to discover that (it surprises me since AudioBufferSourceNode handles
runtime sample rate adjustment quite fine, from my testing.)


On Fri, Jan 17, 2014 at 8:56 AM, Chris Wilson <cwilso@google.com> wrote:

> On Thu, Jan 16, 2014 at 3:41 PM, K. Gadd <kg@luminance.org> wrote:
>
>> All I have to say in response to this is that in my entire history on the
>> w3-audio list, you and others continue to say that things are justified
>> without ever giving specifics.
>>
>
>
>>  Even here, you say you are designing for a real platform, but fail to
>> name the platform, list its constraints, or give examples of how your
>> design is tuned for those constraints.
>>
>
> Web platform.  Javascript runtime (with the attendant experience around
> data types).  Scalable across desktop workstations (with plentiful RAM and
> CPU) and constrained devices like smartphones.  I'm sorry, I thought the
> platform was implied; the WG's use cases and requirements document (
> http://www.w3.org/TR/webaudio-usecases/) goes into a lot more detail of
> the particular audio scenarios that are most interesting.
>
> As far as the question about whether things have been benchmarked, we have
>> real world test cases that are memory constrained, which is why Jukka asked
>> for int16 samples in the first place. We have disclosed the scenarios we
>> care about and they are available in the open; you have not disclosed your
>> scenarios, so we are unable to evaluate them.
>>
>
> And I didn't dismiss your scenarios; and I said the costs should be
> evaluated and weighed (e.g. the cost of converting formats on demand). It's
> clear (to me) that changing that strategy would likely cause SOME effects
> on the runtime; we should understand what those costs are before just
> checking in a change.
>
> Many of the design decisions being questioned on the list are decisions
>> you or others made in the past.
>>
>
> I would say very few if any of the design decisions being questioned on
> the list are decisions I personally made in the past.  (Perhaps you are
> confusing me with Chris Rogers, who certainly did make many of those design
> decisions.)
>
> I'm certain that you made them in good faith based on the information you
>> had at the time; I would hope that nobody is questioning that. The problem
>> is that very often we do not have the information you used to guide your
>> decision, nor do we have the context that led you to optimize for one thing
>> at the expense of another. It is critical for you to clearly communicate
>> this information if you want us to understand why things like the
>> audiobuffer sample format are so important to you and why you seem
>> unwilling to evaluate other options.
>>
>
> I wasn't aware that the audiobuffer sample format IS so important to me.
>  I care about the complexity of the API that's exposed; I *would* certainly
> care about the quality (not an issue with int/float conversion) and CPU and
> memory impact of any change (again, need data here); I don't have a
> particular axe to grind other than that, and I feel like I've repeatedly
> said that.
>
> Also, if you simply think it is too much trouble to rearchitect mixer
>> internals or overhaul the spec, that's fine too: be up-front about it
>> instead of beating around the bush.
>>
>
> As indeed I would.  In case my opposition is not apparent - I believe it
> would be a very, very bad idea to rearchitect the audio system to be a
> fixed-point PIPELINE.  That has many implications in usage patterns that I
> think break far too many of the tenets of the Web Audio scenarios to be
> rational.  I have no particular qualms about the spec needing overhaul - in
> fact, if you'll look at my personal past history here, I've repeatedly said
> that I think parts of the spec need serious work.  (E.g. the
> DynamicsProcessor issue I just raised.)
>
> To the issue at hand: I am not now, nor do I feel I have ever been,
> fundamentally opposed to enabling or even requiring implementations to use
> native bit depths and integer formats for internal buffer storage; I *DO*
> want to feel like I have a better handle on the impact of doing that before
> changing the spec, particularly to require such a thing.  I'm *somewhat*
> opposed to revealing those data types outside the internals (e.g. having
> getChannelData return int8 and int16 types, or whatever shape that would
> take) because I don't see the significant benefit of that, and I feel it
> complexifies the API in a non-Web-y way; I would not characterize this as
> "my mind is made up," but I have stronger opposition to that.
>
>
>> You have clearly voiced opposition to this idea in the past (for example
>> in the race condition discussions),
>>
>
> To overhauling the spec?  No.  No I have not.  I have stated opinion, and
> my opinion is frequently cautious, but I've tried to find the middle ground
> and data-based decision making paths, and have in fact frequently found
> myself in the role of Devil's advocate.  I do NOT believe that I have
> steadfastly opposed change in any discussion.
>
> My opinion on the race condition issue remains the same; making these
> changes will cause negative performance implications in some scenarios.
>  Those implications aren't likely to be huge, and are not in the most
> common scenarios - clever design suggestions from ROC and Jer, among
> others, helped make the solution much less negatively impactful - and
> regardless of those implications, the web platform requires better thread
> protection than the previous design, so it is worth the cost.  I'd go so
> far as to say I'm personally responsible for moving Google to agreeing with
> those changes.
>
>
>> and I can respect the desire to avoid throwing out too much existing
>> work, especially when it means the new software will have to go through a
>> new process of QA and review by developers. But if that's the reason why
>> new features and bug fixes are being dismissed, it should be clear instead
>> of masked behind vague concerns.
>>
>
> As indeed it would be.
>
>
>> OK, if Float32 everything is critical for performance on certain
>> platforms, what platforms are they? Why are they important? (Are they the
>> future of PCs? Are they widely used? Are they key to the business
>> objectives of a stakeholder like Apple or Google?) What are the other
>> constraints of that platform that we should also be optimizing for?
>>
>
> You keep using words like "critical for performance," that I did not use.
>  I said we should understand the impact.  Do you believe there is literally
> zero impact from storing buffers and needing to convert each sample when it
> is pushed into the pipeline?  Or do you just believe we should handwave,
> presume it's less than the positive impact of halving memory usage, and
> change?  Do you think we should be enabling int8 as well?  Int24? Do you
> think we need to expose these types in the API, or simply require that the
> implementation isn't exploding the sizes of objects under the hood?  How,
> precisely, and how flexible does this system need to be (as in, precisely
> what
>
> Incidentally, please don't use shorthand like "Float32 everything",
> because it implies changing the pipeline as well as storing buffers in
> something other than float32, and I can't tell if that's what you're
> referring to or not. Float32 was chosen for the pipeline because its
> dynamic range is substantial enough to encompass the range of human hearing
> (math redacted), while still maintaining a floating-point exponent that
> makes clipping far, far less of a problem (and means we can use the same
> pipeline for control signals that are not typically in the -1 to +1 range,
> without fear of clipping).
>
> -C
>
>
Received on Saturday, 18 January 2014 04:25:34 UTC