Re: TAG feedback on Web Audio from Noah Mendelsohn on 2013-08-08 (public-audio@w3.org from July to September 2013)

From: Noah Mendelsohn <nrm@arcanedomain.com>
Date: Wed, 07 Aug 2013 23:25:14 -0400
To: Srikumar Karaikudi Subramanian <srikumarks@gmail.com>
CC: robert@ocallahan.org, Jer Noble <jer.noble@apple.com>, "K. Gadd" <kg@luminance.org>, Chris Wilson <cwilso@google.com>, Marcus Geelnard <mage@opera.com>, Alex Russell <slightlyoff@google.com>, Anne van Kesteren <annevk@annevk.nl>, Olivier Thereaux <Olivier.Thereaux@bbc.co.uk>, "public-audio@w3.org" <public-audio@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <52030F9A.9090609@arcanedomain.com>

I do really appreciate your quick effort to respond to my suggestion. I'm a 
little concerned, though, as to whether what you're measuring is actually 
in the spirit of what I suggested.

You're using Unix pipes with read and write. Do we know enough about the 
implementation of the pipes to be sure you're measuring patterns that match 
what some particular audio API would do? I suspect not. I do agree that 
your resutls suggests that in your implementation context switch overhead 
is significant, and that may well be the reason your throughput rises with 
packet size.

The benchmark I was proposing was strictly copying bytes, without context 
switches or reliance on OS piping services, to see how fast the hardware 
can do that (and making sure the patterns are likely to use the cache in 
about the same way as the API). Such a measurement sets a bound on the 
overhead from >copying<, which I thought was the question on the table?

I also note that you have some floating point operations in there. They are 
likely swamped for small buffers by your context switch overhead, but if 
you get rid of the context switches I wouldn't be surprised that those 
floating point operations would prove significant.

Indeed, years ago when we were building our parser someone on our team was 
playing around and happened to include a mod (% operator) in much the same 
way you're using that floating point conversion/multiply (which your 
compiler may or may not be optimizing out). It took us a while to realize 
why our results were anomalous: % tends to involve an integer divide, and 
on many machines divide times are significant relative to word access times.

If what's to be benchmarked is memory copy time in buffers sufficiently 
large to miss in 1st/2nd level cache (which seems a reasonable 
approximation to the audio case), then that's what should be benchmarked. 
I'd be suspicious of anything that involves context switching, floating 
point ops, pipes, etc.

Of course, if the audio APIS will necessarily involve OS-level context 
switches, that should be evaluated too, but I'd suggest decoupling the 
context-switch benchmarks from the memory copy benchmarks.

Noah

On 8/7/2013 10:01 PM, Srikumar Karaikudi Subramanian wrote:
> I did a quick test to see what's possible on my laptop (MacBook Air 1.7GHz,
> core i5).
>
> https://gist.github.com/srikumarks/6180450
>
> The C program forks off a child process and the two keep sending one
> float32 buffer of a given size back and forth.  The interesting thing that
> came up in my trial runs is that the data throughput is severely affected
> by the buffer size and not as much (relatively) by whether the buffer is
> malloced fresh and filled for every send. Using a 128 sample buffer, I got
> a throughput around 45MB/s, but with a 4096 sample buffer, I got about
> 400MB/s. Both measurements done with a fresh malloc and fill for every send.
>
> These numbers suggest that in the case of audio, the data throughput is not
> a bottleneck, but the process switching overhead is. However, even with the
> 128 case, > 200 such mono streams can be sent back and forth. This number
> is relevant when you have N number of script nodes in a chain before
> hitting the audio destination node.
>
> When considering 5.1/48KHz audio, the length of the buffer in each
> send/recv is 768 samples, and I got, again, about 150 such streams possible
> in such a chain. The data throughput in this case was about 160MB/sec.
>
> (All throughput numbers are "pessimized" values. See gist for real figures.
> I did not exit any of my other running applications to run this test.)
>
> -Kumar
>
>
> On 8 Aug, 2013, at 3:14 AM, "Robert O'Callahan" <robert@ocallahan.org
> <mailto:robert@ocallahan.org>> wrote:
>
>> On Thu, Aug 8, 2013 at 8:11 AM, Noah Mendelsohn <nrm@arcanedomain.com
>> <mailto:nrm@arcanedomain.com>> wrote:
>>
>>     Now ask questions like: how many bytes per second will be copied in
>>     aggressive usage scenarios for your API? Presumably the answer is
>>     much higher for video than for audio, and likely higher for
>>     multichannel audio (24 track mixing) than for simpler scenarios.
>>
>>
>> For this we need concrete, realistic test cases. We need people who are
>> concerned about copying overhead to identify test cases that they're
>> willing to draw conclusions from. (I.e., test cases where, if we
>> demonstrate low overhead, they won't just turn around and say "OK I'll
>> look for a better testcase" :-).)
>>
>> Rob
>> --
>> Jtehsauts  tshaei dS,o n" Wohfy  Mdaon  yhoaus  eanuttehrotraiitny  eovni
>> le atrhtohu gthot sf oirng iyvoeu rs ihnesa.r"t sS?o  Whhei csha iids
>> teoa stiheer :p atroa lsyazye,d  'mYaonu,r  "sGients  uapr,e  tfaokreg
>> iyvoeunr, 'm aotr  atnod  sgaoy ,h o'mGee.t"  uTph eann dt hwea lmka'n?
>> gBoutt  uIp  waanndt  wyeonut  thoo mken.o w *
>> *
>

Received on Thursday, 8 August 2013 03:25:35 UTC