- From: Noah Mendelsohn <nrm@arcanedomain.com>
- Date: Wed, 07 Aug 2013 16:11:31 -0400
- To: Jer Noble <jer.noble@apple.com>
- CC: "K. Gadd" <kg@luminance.org>, Srikumar Karaikudi Subramanian <srikumarks@gmail.com>, Chris Wilson <cwilso@google.com>, Marcus Geelnard <mage@opera.com>, Alex Russell <slightlyoff@google.com>, Anne van Kesteren <annevk@annevk.nl>, Olivier Thereaux <Olivier.Thereaux@bbc.co.uk>, "robert@ocallahan.org" <robert@ocallahan.org>, "public-audio@w3.org" <public-audio@w3.org>, www-tag@w3.org
I am not expert in details of the proposed interfaces, but I do have experience with research and development work relating to the performance implications of memory-to-memory copying of large data structures, and I'l like to make a suggestion based on that experience. Latency and unpredictable latencies are surely a concern for audio, as described below, but my main point would be to suggest that you do a >> quantitative analysis of the CPU overhead for memory-to-memory copies<<. In the case of audio or video, this would likely be done by running benchmarks of toy kernerls that are realistic not just in terms of data rates, but also in terms of likely memory access patterns (a loop copying data between two small buffers will of likely have higher performance and much less time spent with the CPU waiting for memory than a loop working through streaming buffers). In my experience, you can from such benchmarks easily estimate the data copying rate that will saturate the CPU. This will vary from machine model to machine model, but you can take a typical machine, say 2MHz, and put (at least one core) in a loop reading and writing buffers in a pattern typical of use of your API. Now you know that doing that much access will leave no CPU time for anything else. Copying at half that rate will leave 50% of your CPU free. Yes, with multi-core systems you have to figure out how much memory access from one core stalls the other cores, which is also not hard to measure roughly. Now ask questions like: how many bytes per second will be copied in aggressive usage scenarios for your API? Presumably the answer is much higher for video than for audio, and likely higher for multichannel audio (24 track mixing) than for simpler scenarios. With this in hand, you can do at least very rough quick calculations of how much of a typical computer's capability will be tied up doing memory-to-memory copies in various scenarios. Then let the chips fall where they may. If you can afford it, fine do the copies. If you can't this should be useful way to find that out without spending months building and debugging the whole API. My guess is that with just a few audio tracks and today's fast machines a little copying isn't fatal, but I'd guess that for multiple high quality 24 bit uncompressed audio streams or for video, the overhead of copying will likely prove significant. BTW: the work I did was not on audio, but on XML Parsing. The paper is here [1], and you'll see that the nature of the analysis is as described above. The benchmarks suggested above proved to predict quite well the performance of the parsing system. The short version of the conclusion: many of these applications are performance-limited by the speed of RAM; caches don't help much for streaming applications, and except where you have SMT (multiple thread/core), a given core will be stalled doing nothing while waiting for memory. Noah [1] http://www2006.org/programme/files/pdf/5011.pdf On 8/7/2013 12:24 PM, Jer Noble wrote: > > On Aug 6, 2013, at 6:56 PM, K. Gadd <kg@luminance.org > <mailto:kg@luminance.org>> wrote: > >> Given flash's continued prevalence for streaming realtime audio/video and >> use cases like video chat and game rendering, whatever works for the >> pepper flash plugin should probably work okay for web audio (though it >> certainly isn't necessarily *needed*). > > I believe this is a faulty premise. > > Video, especially 24 FPS video, is much more immune to jitter and delays > than is audio. In the video case, frames are 30 to 40 ms apart, and if a > frame is delayed by a few ms, that delay is almost imperceptible to the > human eye. > > With 44,100 kHz audio and a small frame size (128 samples in Blink & WebKit > Web Audio, usually), each batch of samples is only 3 ms apart, and if those > samples are delayed even by less than a millisecond, there will be an > audible ‘pop’ in the rendered audio. > > Additionally, I would assume that in Chrome the video is piped back to the > main process for rendering, but the audio is rendered locally in the plugin > process. > > -Jer
Received on Wednesday, 7 August 2013 20:11:51 UTC