Re: NXT design for memory barriers and buffer mapping. from Samuel Williams on 2017-12-06 (public-gpu@w3.org from December 2017)

From: Samuel Williams <samuel@codeotaku.com>
Date: Wed, 6 Dec 2017 19:17:13 +1300
To: Corentin Wallez <cwallez@google.com>
Cc: Dzmitry Malyshau <dmalyshau@mozilla.com>, public-gpu <public-gpu@w3.org>
Message-Id: <A95E117C-D052-4D39-9EE7-F0861C37EA6C@codeotaku.com>

> On 6/12/2017, at 12:47 PM, Corentin Wallez <cwallez@google.com> wrote:
> 
> 
> 
> On Fri, Dec 1, 2017 at 4:00 PM, Dzmitry Malyshau <dmalyshau@mozilla.com <mailto:dmalyshau@mozilla.com>> wrote:
> Hi Corentin,
> 
> (branching from the root for the comments about the buffer mapping document, as opposed to the memory barriers)
> 
> > The number of times data is copied before it reaches its final destination is the biggest factor in upload performance.

At least in one of our production hardware renderers implemented using the Vulkan API using NVidia GPUs (whatever is available in AWS), one of the biggest issues we ran into was memory allocation speed. It dwarfed almost all other costs including texture upload. There are ways to amortise this, e.g. reuse buffers being the recommended approach. Strangely enough, a friend of mine tried similar allocations using the CUDA API and had much better throughput.

One other problem we had was trying to be clever about where we allocated data. We convert textures to sRGB, and we ran the conversion algorithm on the same host & gpu visible memory buffer, but it turned out to be a horribly slow approach. It was simply better to decode the PNG to a CPU local buffer, then perform the sRGB conversion in place, and then copy to the host visible/gpu visible buffer.

So, a few “hidden" performance issues.

Kind regards,
Samuel

Received on Wednesday, 6 December 2017 06:46:33 UTC