- From: K. Gadd <kg@luminance.org>
- Date: Tue, 14 Jan 2014 20:01:53 -0800
- To: Chris Wilson <cwilso@google.com>
- Cc: Marcus Geelnard <mage@opera.com>, Paul Adenot <padenot@mozilla.com>, Jukka Jylänki <jujjyl@gmail.com>, "public-audio@w3.org" <public-audio@w3.org>
- Message-ID: <CAPJwq3Vfw3XMKDUpwQi9h-Nfjit8BfYPpLwtJeGm-wJ=qZY-ng@mail.gmail.com>
On Tue, Jan 14, 2014 at 2:48 PM, Chris Wilson <cwilso@google.com> wrote: > If int16 buffers don't offer something approximating actual guarantees, >> you haven't fixed anything - that native port will still have to assume the >> worst (i.e. using 2x as much memory) and be rewritten to work with a tiny >> address space, making your int16 buffer optimization nearly meaningless - >> sure, the mixer might be slightly faster/slower and the process's resident >> memory use will be lower, but it won't enable any new use cases and certain >> ports will still be out of the question. >> > > What's a "guarantee"? Even if we mandated, with a MUST, that > implementations MUST use native 16-bit storage when requested, > implementations might choose not to do that as a performance/battery > optimization. They wouldn't be conforming, but they would work. > The most obvious analogue here is the way textures work in OpenGL and Direct3D. You allocate a texture of a particular size in a particular format, and that's what you get. The GPU is certainly free to take liberties with the actual arrangement of the texture in video memory (and in fact, most do), but the format you ask for is (IIRC) always the format you get. This is important because having extra format conversions introduced at the driver's discretion could result in unprecedented performance consequences or even behavioral differences (due to too much/too little precision). I don't see how audio could really differ dramatically in this area, unless I've overlooked something important. I've love to see examples of how audio is somehow special in this regard. > > The AudioContext's sampleRate is not set to a defined number, but in > practice the sampleRate is set to the audio output sample rate - that is, > the AudioDestinationNode's native rate - since that's where the clock is > coming from. The point is that the entire audio context is run in a single > rate, to minimize resampling. > Arguably the choice to mix in 32-bit float should be equivalent to a choice to mix in 44khz or 48khz. It shouldn't have to influence the source format of audio data any more than the output rate would require source audio to be stored at that rate. This is the point I'm trying to make: both bitness and sample rate are important controls to have over source audio. > > Having such an option in the API gives the implementation an opportunity >>> to save memory when memory is scarce, but it's not necessarily forced to do >>> so. >>> >> >> The whole point is to force the implementation to save memory. An >> application that runs out of memory 80% of the time is not appreciably >> better than one that does so 100% of the time - end users will consider >> both unusable. >> > > Given all the other factors that may change memory usage in the web > platform, I'm not sure why this one feature will solve that problem. Or > even come close. Again, I'm not saying I see no reason to look closely at > this; I'm just saying that I don't think this is as big a slam dunk as you > appear to, and I think there are notable situations when it is better to > NOT store that data in int16, and there will be > What situations are these? I find it hard to imagine a scenario where software playback is going to benefit tremendously from using 2x the memory to store sample data. Certainly there are huge advantages to *mixing* in floating-point; are you arguing that making the mixer slightly faster merits using double the memory (and thus, double the memory bandwidth, if not more - memory bandwidth being especially precious on mobile platforms)? Must the floating-point version of said buffer be the de-facto storage format even though it is merely a minor mixing efficiency optimization? I am dubious about the tremendous cost implied by converting from int16 to float32 in the mixer, also. It's a trivial, common operation, and depending on architecture I would expect it could pay for itself in reduced memory bandwidth usage and more efficient use of L1/L2/L3 caches. Have you benchmarked this? Do you have test cases that demonstrate a tremendous performance win by using float32 for everything versus int16 or int8? > > >> On this whole subject it is important to realize that when talking about >> developers porting games and multimedia software from other native >> platforms, it is usually not wise to assume they are idiots that will shoot >> themselves in the foot. >> > > That was not the intent, and I was certainly not making that assumption. > However, those aren't the only developers that would have this API > available - and I would venture some of them would choose to make this > decision without understanding how it may affect them on other devices or > browsers, now and in the future. Mostly because that's pretty much > impossible to know. > Sacrificing current-day usability in favor of some hypothetical future platform is not a wise decision when we are dealing with existing software. Furthermore, I would argue that we have no proven ability to anticipate future hardware configurations any better than the developers of these games and multimedia applications. To a large extent, these applications have been doing mixing and playback the same way for over a decade, and numerous changes/improvements to hardware have not significantly impacted things, other than cases where buffers moved to/from hardware and filtering/mixing got slightly more sophisticated. The underlying model has not significantly changed. The same is true for graphics rendering: We still basically have buffers containing vertex data, index data, and texel data - as we have since the early days of modern-era OpenGL/Direct3D. I simply cannot imagine a realistic scenario where locking buffers to float32 is optimizing effectively for a future hardware configuration; let alone one that cannot easily cope with int16 (let alone benefit from it). Such an architecture would have significant problems running pretty much any modern software; modern software loves integers. Hell, the JS runtime still freely mixes floats & ints and does frequent conversions. I will agree that we cannot know how this will affect future devices/browsers, especially ones with odd architectures. However, is there really a good reason to compromise usability and performance on current architectures - ones used by the vast majority of living, breathing, paying customers - in favor of customers that don't exist because the devices haven't been made or bought yet? > > Yes, developers make mistakes, and they ship broken software that relies >> on bugs in browser implementations - I can understand the reluctance to >> give developers more ways to make mistakes. >> > > It's not "reluctance to give developers more ways to make mistakes" at > all. It's "caution in exposing low-level platform implementation details > unless you are absolutely, positively certain it can be made a net win > overall." Every low-level implementation detail that's exposed makes it > that much harder for the web platform to scale across devices, and puts > more onus in the developer to own that scalability; that begs for caution. > The source format (bitness & sample rate) of audio is not 'a low level platform implementation detail' any more than the pixel format of a source image is a low level platform implementation detail. The file formats audio is loaded from and rendered to contain this information; authors select it explicitly given particular tradeoffs (i.e. doing some recording at high sample rates then mixing down to lower sample rates). You cannot simply hide it behind a wall and pretend it doesn't exist. We're not talking about abstractions here like those in 3D rendering, where the exact mechanics of fragment rendering and vertex layout are left up to the vendor (as long as they satisfy the requirements in the spec); we are literally talking about foundational details here. As I mentioned before, such abstraction would not be tolerated for textures in rendering (though you could certainly offer it as an 'opt-in' way to somehow save on memory and texture bandwidth). > > >> In these scenarios, we have working applications that do interesting >> things on native platforms, and if you significantly undermine the Web >> platform's ability to deliver parity in these scenarios, you're not >> protecting native app developers from anything, all you're doing is keeping >> them off the Web and stuck in walled garden App Stores. >> > > All I'm saying is "parity does not mean do it the same way," and pointing > out that the Web platform is supposed to scale across different hardware > and devices better, I think, that previous platforms have done. > > Again, I would point out that making a change that would allow developers > to force the integer storage of buffers would have negative side effects, > and all I'm cautioning is those should be carefully examined and weighed. > I would postulate a set of developers would say "well of course, my data > is 16-bit 22kHz, of course I want to force the data to be stored that way > to save memory!" without considering that by doing so, they are going to be > burning battery life (aka CPU time). That's not always the right tradeoff. > I'm not advocating that everything must be done the same way. I'm advocating for having an actual solution for this problem instead of continuing to wave your hands using (at least in my history following this list) wholly unstated hypothetical future use cases as justification. You don't have to rearchitect the whole Web Audio pipeline or introduce a sweeping set of new features, just provide a real-world solution for controlling the (already extreme) memory usage of AudioBuffers. P.S. in graphics scenarios we've been relying heavily on compressed storage of texel data in memory for over a decade, because it turns out we never have enough memory to store all our data. Given that the size of these float32 audiobuffers is problematic in reality for existing game demos, perhaps it could be worthwhile to use efficient in-memory compression for audio? It is certainly the case that lots of real-world games do streaming decompression for some of their audio (i.e. music and voiced dialogue) instead of decoding it up front into enormous buffers. Note that I am not advocating for streaming *from storage*, I am advocating for streaming *from memory*. The XBox 360 actually has support for this in the southbridge, if memory serves.
Received on Wednesday, 15 January 2014 04:03:05 UTC