Re: Integer PCM sample formats to Web Audio API? from K. Gadd on 2014-01-18 (public-audio@w3.org from January to March 2014)

From: K. Gadd <kg@luminance.org>
Date: Fri, 17 Jan 2014 20:41:48 -0800
To: Chris Wilson <cwilso@google.com>
Cc: "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <CAPJwq3VeNcDU8_Z5GyPkRkVH69R9vPUU=cpZoe3yHF6Gyuqk3A@mail.gmail.com>
Sorry, an odd question has just occurred to me re: getChannelData, and it
is unclear from the spec:

If buffers are resampled offline at load time, what does this mean for the
data observed by getChannelData? Is it the resampled data? Is the data
un-resampled somehow to reproduce the original data that was placed into
the buffer, and is it resampled again when you write to the buffer? It
seems as if getChannelData has to expose the actual, resampled buffer. In
this case, how is that resampling exposed? Will all AudioBuffer instances
have a sampleRate property with a value matching the sampleRate of the
AudioContext? If so, why is that property even there?

P.S. the current spec version says this:

sampleRate

The sample rate (in sample-frames per second) at which the AudioContext
handles audio. It is assumed that all AudioNodes in the context run at this
rate. In making this assumption, sample-rate converters or "varispeed"
processors are not supported in real-time processing.

This seems to contradict AudioBufferSourceNode's support for sample-rate
conversion, unless something else is meant by that statement.

P.P.S. For the sake of information, I did some basic research and according
to some instruction latency tables published by ARM, some semi-modern ARM
cores based on the ARMv7-R architecture have significant latencies for both
float32->float64 and float64->float32 conversions; 3 cycles of instruction
latency and a result latency of 5-7 cycles. Conversions to/from integers
from float32 have considerably lower latencies in those published tables (1
cycle, typically). I couldn't find tables for the ARM chips you'd find in a
typical mobile phone, but perhaps you can - the instruction in question is
'vcvt'. I'm not an expert on ARM, so maybe I misread the available
information. Regardless, a point for consideration: Conversions between the
native JS float64 format and the float32 used by sample buffers could
actually be *worse* than those involved for writing into an int16 buffer in
some scenarios! Hopefully, this is not something that affects actual users.


On Fri, Jan 17, 2014 at 8:24 PM, K. Gadd <kg@luminance.org> wrote:

> I don't understand how 'the web platform' is a meaningful response when I
> ask about platform constraints and how they're being optimized for. 'the
> web platform' is not a platform you can optimize for; it is myriad browsers
> atop myriad operating systems on myriad devices. They don't share any
> particular constraints or preferred formats or available memory or
> bandwidth constraints. It is utterly ridiculous to claim that locking
> sample buffers to float32 is 'optimizing for the web platform'. Scalable
> across all devices is, in this case, effectively as good as saying we have
> no optimization target (unless you mean 'scalable' in the sense 'this
> application's out-of-memory dialog box will run on a variety of
> platforms'). I'm thinking of things in the vein of 'web audio
> games/applications should perform adequately on a Nexus 4 as well as on a
> circa-2013 Macbook Air', which is a very precise and easily determined
> thing - i.e. has this API designed to be efficient on those two
> architectures, does it cope with the resource constraints of each, etc.
> These goals could be expanded more precisely by saying 'we want *these test
> cases* to perform acceptably *on these platforms*', but I am not going to
> ask for something so complex.
>
> P.S. JS numbers are 64-bit floats, not 32-bit floats. I don't understand
> the claim that float32 is used because of javascript runtimes. Are
> float32->float64 widening conversions free on all the platforms you care
> about? (IIRC they are basically free on x86, but that's just because its
> internal float precision is much higher, and i'm not certain those
> conversions are ACTUALLY free there in any case, just effectively free...
> For reference, many ARM chips do not have hardware floating-point support,
> or it is provided by a co-processor, or it comes at increased instruction
> latencies. Are we 'optimizing' for those low-end chips as well by using
> floats everywhere?
>
> It's impossible for anyone - myself included - to actually provide the
> measurements and data you want when you won't give us details on what you
> want us to measure. Is it okay to assume the presence of SIMD? What about
> out-of-order? What kind of memory bandwidth can we assume? If memory
> bandwidth is effectively unconstrained, the cost of larger samples is
> minimized, while on bandwidth-constrained devices (I.E. most mobile
> devices) doubling the size of all your buffers can be significant. What
> performance requirements must be met for an int16 buffer implementation to
> actually pass muster? As long as the requirements are unstated, the
> necessary effort (to implement int16 buffers) to benchmark might seem
> rather wasted if in the end someone will move the goalposts and say 'yes,
> very good, but it must be faster than that' or 'it must optimize for this
> one other case too'.
>
> The sample scenarios you cite in the document are great for understanding
> the *intent* of the API, but do nothing to tell us why float32 was chosen
> as a standard for everything and why it is optimal vs int16 buffers, or as
> one random example, why JS's native float64 type was not used for samples.
> What I'm talking about is hardware constraints and platform constraints,
> since when you're talking about performance optimization, understanding the
> hardware/platform is essential if you want to actually deliver good
> performance. Maybe I have been misunderstanding you this whole time and you
> have been saying that float32 was chosen based on applications, not based
> on hardware/platform - but in that case it still makes little sense to me,
> because I don't see how that sample format choice improves on any of those
> applications either. If anything, just like the real-world apps that are
> impaired by float32 buffers' memory demands, some of the hypothetical
> applications in the web audio document CLEARLY call for smaller,
> memory-efficient buffers.
>
> Maybe I'm just missing something obvious here; from my perspective there
> are shipped applications/games out there using web audio that end up having
> to dedicate a lot of their memory to float32 buffers. int16 buffers would
> greatly relieve that memory pressure, which has obvious, trivially provable
> benefits (OOM -> not OOM). I have never argued that conversion would have
> no impact - I have in fact made it clear that I simply believe a
> performance hit (when the user opts in to int16 buffers) is better than the
> application never working at all. I don't see how an opt-in performance hit
> that enables use cases is a bad thing, though perhaps your fears that
> developers will misuse it justify the concern.
>
>
> Anyway, re your excellent points on things like getChannelData:
>
> Arguably if getChannelData was specced to always yield a float32 array,
> you have no choice but to keep that behavior. It's kind of awful, but I'm
> sure apps out there already depend on it and will be confused if some other
> part of the pipeline (i.e. a third-party audio library) opting into
> efficient buffers changes the format of the values they get out of Web
> Audio. You would probably have to introduce a new entry point that exposes
> the *actual* samples contained within the buffer. This also implies that
> getChannelData has to provide a read-write float32 'proxy' for the actual
> samples, which is... not great. Maybe it's better to kill getChannelData
> entirely on non-float buffers, so that people know they need to use the
> alternate entry point if they want to manipulate samples. (Maybe a better
> approach is to introduce a new optional argument, defaulting to false, that
> says 'I know I'm not getting a float32 array back from this call'. Yuck,
> though.) This is much simpler in the rendering world because D3D and OpenGL
> both expose APIs for buffer access that don't imply that the format you
> pass to/from the driver is the format on the GPU (noted exception: D3D used
> to encourage mapping a texture's bytes into your address space r/w; this is
> not encouraged anymore because it defeats all sorts of interesting
> optimizations.)
>
> IMO the formats you should support for internal buffers are the formats
> you support for buffer input already - if it can be decoded/uploaded to a
> buffer right now using Web Audio, it should be possible to store that in an
> AudioBuffer, within reason, regardless of the bitness and sample rate. I
> can see how this would get hairy if you currently support oddball
> bitnesses, though, since that makes it impossible to represent them as a
> typed array.
>
> I agree that egregious sample rate conversion poses more of a problem for
> memory use than sample bitness. I was not aware that web audio
> implementations did offline resampling at load time and am quite dismayed
> to discover that (it surprises me since AudioBufferSourceNode handles
> runtime sample rate adjustment quite fine, from my testing.)
>
>
> On Fri, Jan 17, 2014 at 8:56 AM, Chris Wilson <cwilso@google.com> wrote:
>
>> On Thu, Jan 16, 2014 at 3:41 PM, K. Gadd <kg@luminance.org> wrote:
>>
>>> All I have to say in response to this is that in my entire history on
>>> the w3-audio list, you and others continue to say that things are justified
>>> without ever giving specifics.
>>>
>>
>>
>>>  Even here, you say you are designing for a real platform, but fail to
>>> name the platform, list its constraints, or give examples of how your
>>> design is tuned for those constraints.
>>>
>>
>> Web platform.  Javascript runtime (with the attendant experience around
>> data types).  Scalable across desktop workstations (with plentiful RAM and
>> CPU) and constrained devices like smartphones.  I'm sorry, I thought the
>> platform was implied; the WG's use cases and requirements document (
>> http://www.w3.org/TR/webaudio-usecases/) goes into a lot more detail of
>> the particular audio scenarios that are most interesting.
>>
>> As far as the question about whether things have been benchmarked, we
>>> have real world test cases that are memory constrained, which is why Jukka
>>> asked for int16 samples in the first place. We have disclosed the scenarios
>>> we care about and they are available in the open; you have not disclosed
>>> your scenarios, so we are unable to evaluate them.
>>>
>>
>> And I didn't dismiss your scenarios; and I said the costs should be
>> evaluated and weighed (e.g. the cost of converting formats on demand). It's
>> clear (to me) that changing that strategy would likely cause SOME effects
>> on the runtime; we should understand what those costs are before just
>> checking in a change.
>>
>> Many of the design decisions being questioned on the list are decisions
>>> you or others made in the past.
>>>
>>
>> I would say very few if any of the design decisions being questioned on
>> the list are decisions I personally made in the past.  (Perhaps you are
>> confusing me with Chris Rogers, who certainly did make many of those design
>> decisions.)
>>
>> I'm certain that you made them in good faith based on the information you
>>> had at the time; I would hope that nobody is questioning that. The problem
>>> is that very often we do not have the information you used to guide your
>>> decision, nor do we have the context that led you to optimize for one thing
>>> at the expense of another. It is critical for you to clearly communicate
>>> this information if you want us to understand why things like the
>>> audiobuffer sample format are so important to you and why you seem
>>> unwilling to evaluate other options.
>>>
>>
>> I wasn't aware that the audiobuffer sample format IS so important to me.
>>  I care about the complexity of the API that's exposed; I *would* certainly
>> care about the quality (not an issue with int/float conversion) and CPU and
>> memory impact of any change (again, need data here); I don't have a
>> particular axe to grind other than that, and I feel like I've repeatedly
>> said that.
>>
>> Also, if you simply think it is too much trouble to rearchitect mixer
>>> internals or overhaul the spec, that's fine too: be up-front about it
>>> instead of beating around the bush.
>>>
>>
>> As indeed I would.  In case my opposition is not apparent - I believe it
>> would be a very, very bad idea to rearchitect the audio system to be a
>> fixed-point PIPELINE.  That has many implications in usage patterns that I
>> think break far too many of the tenets of the Web Audio scenarios to be
>> rational.  I have no particular qualms about the spec needing overhaul - in
>> fact, if you'll look at my personal past history here, I've repeatedly said
>> that I think parts of the spec need serious work.  (E.g. the
>> DynamicsProcessor issue I just raised.)
>>
>> To the issue at hand: I am not now, nor do I feel I have ever been,
>> fundamentally opposed to enabling or even requiring implementations to use
>> native bit depths and integer formats for internal buffer storage; I *DO*
>> want to feel like I have a better handle on the impact of doing that before
>> changing the spec, particularly to require such a thing.  I'm *somewhat*
>> opposed to revealing those data types outside the internals (e.g. having
>> getChannelData return int8 and int16 types, or whatever shape that would
>> take) because I don't see the significant benefit of that, and I feel it
>> complexifies the API in a non-Web-y way; I would not characterize this as
>> "my mind is made up," but I have stronger opposition to that.
>>
>>
>>> You have clearly voiced opposition to this idea in the past (for example
>>> in the race condition discussions),
>>>
>>
>> To overhauling the spec?  No.  No I have not.  I have stated opinion, and
>> my opinion is frequently cautious, but I've tried to find the middle ground
>> and data-based decision making paths, and have in fact frequently found
>> myself in the role of Devil's advocate.  I do NOT believe that I have
>> steadfastly opposed change in any discussion.
>>
>> My opinion on the race condition issue remains the same; making these
>> changes will cause negative performance implications in some scenarios.
>>  Those implications aren't likely to be huge, and are not in the most
>> common scenarios - clever design suggestions from ROC and Jer, among
>> others, helped make the solution much less negatively impactful - and
>> regardless of those implications, the web platform requires better thread
>> protection than the previous design, so it is worth the cost.  I'd go so
>> far as to say I'm personally responsible for moving Google to agreeing with
>> those changes.
>>
>>
>>> and I can respect the desire to avoid throwing out too much existing
>>> work, especially when it means the new software will have to go through a
>>> new process of QA and review by developers. But if that's the reason why
>>> new features and bug fixes are being dismissed, it should be clear instead
>>> of masked behind vague concerns.
>>>
>>
>> As indeed it would be.
>>
>>
>>> OK, if Float32 everything is critical for performance on certain
>>> platforms, what platforms are they? Why are they important? (Are they the
>>> future of PCs? Are they widely used? Are they key to the business
>>> objectives of a stakeholder like Apple or Google?) What are the other
>>> constraints of that platform that we should also be optimizing for?
>>>
>>
>> You keep using words like "critical for performance," that I did not use.
>>  I said we should understand the impact.  Do you believe there is literally
>> zero impact from storing buffers and needing to convert each sample when it
>> is pushed into the pipeline?  Or do you just believe we should handwave,
>> presume it's less than the positive impact of halving memory usage, and
>> change?  Do you think we should be enabling int8 as well?  Int24? Do you
>> think we need to expose these types in the API, or simply require that the
>> implementation isn't exploding the sizes of objects under the hood?  How,
>> precisely, and how flexible does this system need to be (as in, precisely
>> what
>>
>> Incidentally, please don't use shorthand like "Float32 everything",
>> because it implies changing the pipeline as well as storing buffers in
>> something other than float32, and I can't tell if that's what you're
>> referring to or not. Float32 was chosen for the pipeline because its
>> dynamic range is substantial enough to encompass the range of human hearing
>> (math redacted), while still maintaining a floating-point exponent that
>> makes clipping far, far less of a problem (and means we can use the same
>> pipeline for control signals that are not typically in the -1 to +1 range,
>> without fear of clipping).
>>
>> -C
>>
>>
>
Received on Saturday, 18 January 2014 04:43:00 UTC