- From: Noah Mendelsohn <nrm@arcanedomain.com>
- Date: Wed, 07 Aug 2013 23:25:14 -0400
- To: Srikumar Karaikudi Subramanian <srikumarks@gmail.com>
- CC: robert@ocallahan.org, Jer Noble <jer.noble@apple.com>, "K. Gadd" <kg@luminance.org>, Chris Wilson <cwilso@google.com>, Marcus Geelnard <mage@opera.com>, Alex Russell <slightlyoff@google.com>, Anne van Kesteren <annevk@annevk.nl>, Olivier Thereaux <Olivier.Thereaux@bbc.co.uk>, "public-audio@w3.org" <public-audio@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>
I do really appreciate your quick effort to respond to my suggestion. I'm a little concerned, though, as to whether what you're measuring is actually in the spirit of what I suggested. You're using Unix pipes with read and write. Do we know enough about the implementation of the pipes to be sure you're measuring patterns that match what some particular audio API would do? I suspect not. I do agree that your resutls suggests that in your implementation context switch overhead is significant, and that may well be the reason your throughput rises with packet size. The benchmark I was proposing was strictly copying bytes, without context switches or reliance on OS piping services, to see how fast the hardware can do that (and making sure the patterns are likely to use the cache in about the same way as the API). Such a measurement sets a bound on the overhead from >copying<, which I thought was the question on the table? I also note that you have some floating point operations in there. They are likely swamped for small buffers by your context switch overhead, but if you get rid of the context switches I wouldn't be surprised that those floating point operations would prove significant. Indeed, years ago when we were building our parser someone on our team was playing around and happened to include a mod (% operator) in much the same way you're using that floating point conversion/multiply (which your compiler may or may not be optimizing out). It took us a while to realize why our results were anomalous: % tends to involve an integer divide, and on many machines divide times are significant relative to word access times. If what's to be benchmarked is memory copy time in buffers sufficiently large to miss in 1st/2nd level cache (which seems a reasonable approximation to the audio case), then that's what should be benchmarked. I'd be suspicious of anything that involves context switching, floating point ops, pipes, etc. Of course, if the audio APIS will necessarily involve OS-level context switches, that should be evaluated too, but I'd suggest decoupling the context-switch benchmarks from the memory copy benchmarks. Noah On 8/7/2013 10:01 PM, Srikumar Karaikudi Subramanian wrote: > I did a quick test to see what's possible on my laptop (MacBook Air 1.7GHz, > core i5). > > https://gist.github.com/srikumarks/6180450 > > The C program forks off a child process and the two keep sending one > float32 buffer of a given size back and forth. The interesting thing that > came up in my trial runs is that the data throughput is severely affected > by the buffer size and not as much (relatively) by whether the buffer is > malloced fresh and filled for every send. Using a 128 sample buffer, I got > a throughput around 45MB/s, but with a 4096 sample buffer, I got about > 400MB/s. Both measurements done with a fresh malloc and fill for every send. > > These numbers suggest that in the case of audio, the data throughput is not > a bottleneck, but the process switching overhead is. However, even with the > 128 case, > 200 such mono streams can be sent back and forth. This number > is relevant when you have N number of script nodes in a chain before > hitting the audio destination node. > > When considering 5.1/48KHz audio, the length of the buffer in each > send/recv is 768 samples, and I got, again, about 150 such streams possible > in such a chain. The data throughput in this case was about 160MB/sec. > > (All throughput numbers are "pessimized" values. See gist for real figures. > I did not exit any of my other running applications to run this test.) > > -Kumar > > > On 8 Aug, 2013, at 3:14 AM, "Robert O'Callahan" <robert@ocallahan.org > <mailto:robert@ocallahan.org>> wrote: > >> On Thu, Aug 8, 2013 at 8:11 AM, Noah Mendelsohn <nrm@arcanedomain.com >> <mailto:nrm@arcanedomain.com>> wrote: >> >> Now ask questions like: how many bytes per second will be copied in >> aggressive usage scenarios for your API? Presumably the answer is >> much higher for video than for audio, and likely higher for >> multichannel audio (24 track mixing) than for simpler scenarios. >> >> >> For this we need concrete, realistic test cases. We need people who are >> concerned about copying overhead to identify test cases that they're >> willing to draw conclusions from. (I.e., test cases where, if we >> demonstrate low overhead, they won't just turn around and say "OK I'll >> look for a better testcase" :-).) >> >> Rob >> -- >> Jtehsauts tshaei dS,o n" Wohfy Mdaon yhoaus eanuttehrotraiitny eovni >> le atrhtohu gthot sf oirng iyvoeu rs ihnesa.r"t sS?o Whhei csha iids >> teoa stiheer :p atroa lsyazye,d 'mYaonu,r "sGients uapr,e tfaokreg >> iyvoeunr, 'm aotr atnod sgaoy ,h o'mGee.t" uTph eann dt hwea lmka'n? >> gBoutt uIp waanndt wyeonut thoo mken.o w * >> * >
Received on Thursday, 8 August 2013 03:25:33 UTC