- From: Dzmitry Malyshau <dmalyshau@mozilla.com>
- Date: Fri, 1 Dec 2017 16:00:50 -0500
- To: Corentin Wallez <cwallez@google.com>
- Cc: public-gpu <public-gpu@w3.org>
- Message-ID: <CAHnMvnLMZWpst02Kh5vbxzorXoNNZb5iGeUYSbZERtH5Uuopxw@mail.gmail.com>
Hi Corentin, (branching from the root for the comments about the buffer mapping document, as opposed to the memory barriers) > The number of times data is copied before it reaches its final destination is the biggest factor in upload performance. I see this assumption being rooted deeply into the (amazing) WebGL work the group was doing on the subject. It's based on the fact we don't control the GL driver's side, which does more copying, renaming, and essentially transferring to GPU. If we look at the problem from the next-gen native API's point of view, I'd say that there is a more important factor than the number of copies: avoid CPU/GPU synchronization stalls. All in all, I'd see the following issues here: - stalls - copies - latency > If WebGPU exposes a buffer mapping primitive, the application will be able to decompress directly into either shared-memory (for Chrome) or directly into GPU-visible memory, avoiding one copy. Mapping a shared memory object sounds like a great idea. I believe this should be the only way to transfer large amounts of data from CPU to GPU. A list of all possible methods to do so, ordered from bigger to smaller data sizes, could be: 1. Mapping a shared memory object. Possibly, persistently. 2. Updating buffer contents via command buffers, e.g. `vkCmdUpdateBuffer` which supports up to 64k. The graphics backend can then manage the staging area and schedule uploads internally, if it's not natively supported. 3. Push constants. > Since we assume the API tries to prevent data races on the GPU, it makes sense to also prevent data races between the CPU and the GPU for the exact same reasons. As discussed previously, we can't (or rather don't want to) avoid all data races of the GPU (example: UAV resource access). I'd argue that the browser runtime doesn't necessarily need to enforce the rule here as much as it needs to provide means/API for the user to implement it the way of no data races, with some ability to validate the access. > This means that the CPU should not be able to read or write buffer memory while the GPU is using the buffer. This discards one of the most important mapping scenario: persistent mapping, where the user would repeatedly change parts of the mapped region and communicate to the driver, which need to be invalidated on the GPU side. The user can they tell WebGPU when a range of the mapped region is changed. This can propagate directly through the code that copies from the shared memory into runtime-managed space or GPU memory. > We don’t see a compelling use for MapRead | MapWrite buffers Yes, agree. Thanks, Dzmitry On Tue, Nov 14, 2017 at 11:51 PM, Corentin Wallez <cwallez@google.com> wrote: > Hey all, > > We wrote some document to help everyone reason about NXT's proposals for > memory barriers and resource upload /download. Unfortunately we still don't > have a fleshed out proposal that minimizes the number of copies on UMA. > Instead the docs focus on explaining our current design for resource > upload/download and for memory barriers since they are very tied. > Eventually we'll have these docs in MarkDown in some repo, either WebGPU's > or NXT's. > > - NXT "memory barriers" > <https://docs.google.com/document/d/1k7lPmxP7M7MMQR4g210lNC5TPwmXCMLgKOQWNiuJxzA> > <- Please read this first as buffer mapping depends on it. > - NXT buffer mapping > <https://docs.google.com/document/d/1HFzMMvDGHFtTgjNT0j-0SQ1fNU9R7woZ4JuNJdAXBjg> > > Cheers, > > Corentin >
Received on Friday, 1 December 2017 21:01:14 UTC