Re: TAG feedback on Web Audio from Marcus Geelnard on 2013-08-08 (public-audio@w3.org from July to September 2013)

From: Marcus Geelnard <mage@opera.com>
Date: Thu, 08 Aug 2013 11:29:29 +0200
To: Noah Mendelsohn <nrm@arcanedomain.com>
CC: Jer Noble <jer.noble@apple.com>, "K. Gadd" <kg@luminance.org>, Srikumar Karaikudi Subramanian <srikumarks@gmail.com>, Chris Wilson <cwilso@google.com>, Alex Russell <slightlyoff@google.com>, Anne van Kesteren <annevk@annevk.nl>, Olivier Thereaux <Olivier.Thereaux@bbc.co.uk>, "robert@ocallahan.org" <robert@ocallahan.org>, "public-audio@w3.org" <public-audio@w3.org>, www-tag@w3.org
Message-ID: <520364F9.50607@opera.com>

Hi Noah,

While I think you have a good approach to benchmarking the copying 
performance, I'm afraid that we are diverging somewhat from the topic 
here...

I think that most people on this list have agreed that the time/speed 
implications of a memcpy is not really an issue. We're only discussing 
potential changes to the interfaces (JS access points), and if we take a 
quick high-level view of the matter I believe these are valid assumptions:

* Most interfaces that could potentially get an additional copy 
operation are in non-critical paths (i.e. will not affect the 
latency/throughput of the audio pipeline).
* Most JS<->Audio API interaction points that could require copying will 
involve some kind of JS processing (e.g. curve generation, audio 
processing, etc), which will significantly outweigh the impact of a copy 
operation in terms of time spent.
* The single critical path (IMO) is the AudioProcessingEvent, but I 
personally think that copy operations are superfluous & unnecessary 
there (i.e. we can use ownership transfer via neutering instead).

If, on the other hand, there are any critical use cases that people can 
come up with that would make these assumptions false, it would be very 
interesting to get them onto the table. So far none have surfaced, so 
I'd stick with these assumptions and conclude that copying will not 
impose any noticeable performance degradations (again, referring to 
time/speed/latency/throughput).

/Marcus




2013-08-07 22:11, Noah Mendelsohn skrev:
> I am not expert in details of the proposed interfaces, but I do have 
> experience with research and development work relating to the 
> performance implications of memory-to-memory copying of large data 
> structures, and I'l like to make a suggestion based on that experience.
>
> Latency and unpredictable latencies are surely a concern for audio, as 
> described below, but my main point would be to suggest that you do a 
> >> quantitative analysis of the CPU overhead for memory-to-memory 
> copies<<.
>
> In the case of audio or video, this would likely be done by running 
> benchmarks of toy kernerls that are realistic not just in terms of 
> data rates, but also in terms of likely memory access patterns (a loop 
> copying data between two small buffers will of likely have higher 
> performance and much less time spent with the CPU waiting for memory 
> than a loop working through streaming buffers).
>
> In my experience, you can from such benchmarks easily estimate the 
> data copying rate that will saturate the CPU. This will vary from 
> machine model to machine model, but you can take a typical machine, 
> say 2MHz, and put (at least one core) in a loop reading and writing 
> buffers in a pattern typical of use of your API.
>
> Now you know that doing that much access will leave no CPU time for 
> anything else. Copying at half that rate will leave 50% of your CPU 
> free. Yes, with multi-core systems you have to figure out how much 
> memory access from one core stalls the other cores, which is also not 
> hard to measure roughly.
>
> Now ask questions like: how many bytes per second will be copied in 
> aggressive usage scenarios for your API? Presumably the answer is much 
> higher for video than for audio, and likely higher for multichannel 
> audio (24 track mixing) than for simpler scenarios.
>
> With this in hand, you can do at least very rough quick calculations 
> of how much of a typical computer's capability will be tied up doing 
> memory-to-memory copies in various scenarios. Then let the chips fall 
> where they may. If you can afford it, fine do the copies. If you can't 
> this should be useful way to find that out without spending months 
> building and debugging the whole API. My guess is that with just a few 
> audio tracks and today's fast machines a little copying isn't fatal, 
> but I'd guess that for multiple high quality 24 bit uncompressed audio 
> streams or for video, the overhead of copying will likely prove 
> significant.
>
> BTW: the work I did was not on audio, but on XML Parsing. The paper is 
> here [1], and you'll see that the nature of the analysis is as 
> described above. The benchmarks suggested above proved to predict 
> quite well the performance of the parsing system. The short version of 
> the conclusion: many of these applications are performance-limited by 
> the speed of RAM; caches don't help much for streaming applications, 
> and except where you have SMT (multiple thread/core), a given core 
> will be stalled doing nothing while waiting for memory.
>
> Noah
>
> [1] http://www2006.org/programme/files/pdf/5011.pdf
>
> On 8/7/2013 12:24 PM, Jer Noble wrote:
>>
>> On Aug 6, 2013, at 6:56 PM, K. Gadd <kg@luminance.org
>> <mailto:kg@luminance.org>> wrote:
>>
>>> Given flash's continued prevalence for streaming realtime 
>>> audio/video and
>>> use cases like video chat and game rendering, whatever works for the
>>> pepper flash plugin should probably work okay for web audio (though it
>>> certainly isn't necessarily *needed*).
>>
>> I believe this is a faulty premise.
>>
>> Video, especially 24 FPS video, is much more immune to jitter and delays
>> than is audio.  In the video case, frames are 30 to 40 ms apart, and 
>> if a
>> frame is delayed by a few ms, that delay is almost imperceptible to the
>> human eye.
>>
>> With 44,100 kHz audio and a small frame size (128 samples in Blink & 
>> WebKit
>> Web Audio, usually), each batch of samples is only 3 ms apart, and if 
>> those
>> samples are delayed even by less than a millisecond, there will be an
>> audible ‘pop’ in the rendered audio.
>>
>> Additionally, I would assume that in Chrome the video is piped back 
>> to the
>> main process for rendering, but the audio is rendered locally in the 
>> plugin
>> process.
>>
>> -Jer


-- 
Marcus Geelnard
Technical Lead, Mobile Infrastructure
Opera Software

Received on Thursday, 8 August 2013 09:30:03 UTC