- From: Jorge <jorge@jorgechamorro.com>
- Date: Fri, 14 Jan 2011 00:53:10 +0100
On 13/01/2011, at 23:21, Boris Zbarsky wrote: > On 1/13/11 4:37 PM, Glenn Maynard wrote: > >> I suspect there's something simpler going on here, though--as you >> said, copying a 10 MB buffer really should be very quick. > > It's really not that quick, actually. First, you have to allocate a new 10MB buffer. Then you have to memcpy into it. Then you have to free it at some point. I just wrote a simple test C program that has a single 10MB array initialized and then in a loop allocates a 10MB array, memcpys into it, and then frees the 10MB allocation it just made. It takes about 5ms per loop iteration to run on my system (fairly high-end laptop that was new in July 2010). The time is split about 50-50 between the allocation and the memcpy. > > Just to be clear, 2.5ms to copy 10MB means that my CPU is spending about 0.25ns per byte. It's a 2.66Ghz CPU, so that's about 0.66 clock cycles per byte, or about 1.5 bytes per clock cycle. That's pretty believable if we're having to stall the CPU every so often to wait for RAM. > > Note that a key issue here is that 10MB is larger than half my L3 cache. If I stick to arrays that are small enough that both source and destination fit in the cache, things are much faster. Right, and there's neither the need to duplicate them, nor to occupy say 300Mb/s of memory bandwidth memcpying frames @ 30 fps, nor to trash 5*30 ms of cpu time per second, gratuitously, when it can be avoided. -- Jorge.
Received on Thursday, 13 January 2011 15:53:10 UTC