- From: Corentin Wallez <cwallez@google.com>
- Date: Tue, 5 Dec 2017 18:47:03 -0500
- To: Dzmitry Malyshau <dmalyshau@mozilla.com>
- Cc: public-gpu <public-gpu@w3.org>
- Message-ID: <CAGdfWNPhZbah=Hfc4mPd-NB2tM4GiLiR3q1DhsowrfhXQOHvhg@mail.gmail.com>
On Fri, Dec 1, 2017 at 4:00 PM, Dzmitry Malyshau <dmalyshau@mozilla.com> wrote: > Hi Corentin, > > (branching from the root for the comments about the buffer mapping > document, as opposed to the memory barriers) > > > The number of times data is copied before it reaches its final > destination is the biggest factor in upload performance. > > I see this assumption being rooted deeply into the (amazing) WebGL work > the group was doing on the subject. It's based on the fact we don't control > the GL driver's side, which does more copying, renaming, and essentially > transferring to GPU. If we look at the problem from the next-gen native > API's point of view, I'd say that there is a more important factor than the > number of copies: avoid CPU/GPU synchronization stalls. All in all, I'd see > the following issues here: > > - stalls > - copies > - latency > > Agreed. We should have called stalls out more explicitly, but our design already makes it so there's no stalls. The next thing is copies. (and the're especially important costly on mobile). > > If WebGPU exposes a buffer mapping primitive, the application will be > able to decompress directly into either shared-memory (for Chrome) or > directly into GPU-visible memory, avoiding one copy. > > Mapping a shared memory object sounds like a great idea. I believe this > should be the only way to transfer large amounts of data from CPU to GPU. A > list of all possible methods to do so, ordered from bigger to smaller data > sizes, could be: > > 1. Mapping a shared memory object. Possibly, persistently. > 2. Updating buffer contents via command buffers, e.g. > `vkCmdUpdateBuffer` which supports up to 64k. The graphics backend can then > manage the staging area and schedule uploads internally, if it's not > natively supported. > 3. Push constants. > > > +1. APIs `vkCmdUpdateBuffer` doesn't have equivalent in other APIs (nothing in D3D12, `setBytes` in Metal that replaces the whole buffer). So if WebGPU is to have a pipelined copy like this, I suggest we do it post-MVP. > Since we assume the API tries to prevent data races on the GPU, it makes > sense to also prevent data races between the CPU and the GPU for the exact > same reasons. > > As discussed previously, we can't (or rather don't want to) avoid all data > races of the GPU (example: UAV resource access). I'd argue that the browser > runtime doesn't necessarily need to enforce the rule here as much as it > needs to provide means/API for the user to implement it the way of no data > races, with some ability to validate the access. > > > This means that the CPU should not be able to read or write buffer > memory while the GPU is using the buffer. > > This discards one of the most important mapping scenario: persistent > mapping, where the user would repeatedly change parts of the mapped region > and communicate to the driver, which need to be invalidated on the GPU > side. The user can they tell WebGPU when a range of the mapped region is > changed. This can propagate directly through the code that copies from the > shared memory into runtime-managed space or GPU memory. > > Racy UAV access are impossible to validate and most uses of the API won't have them. GPU-CPU races however look to be much more common that's why we're thinking we should prevent them if we can do it with a tiny performance and/or convenience hit. We think this is doable with what we presented where instead of having a single ringbuffer and synchronization done manually by the application, there's a ringbuffer of buffers and safe-guards provided by the API. The range discard you're mentioning can be done with "MapWriteSync". > > We don’t see a compelling use for MapRead | MapWrite buffers > > Yes, agree. > > Thanks, > Dzmitry > > > On Tue, Nov 14, 2017 at 11:51 PM, Corentin Wallez <cwallez@google.com> > wrote: > >> Hey all, >> >> We wrote some document to help everyone reason about NXT's proposals for >> memory barriers and resource upload /download. Unfortunately we still don't >> have a fleshed out proposal that minimizes the number of copies on UMA. >> Instead the docs focus on explaining our current design for resource >> upload/download and for memory barriers since they are very tied. >> Eventually we'll have these docs in MarkDown in some repo, either WebGPU's >> or NXT's. >> >> - NXT "memory barriers" >> <https://docs.google.com/document/d/1k7lPmxP7M7MMQR4g210lNC5TPwmXCMLgKOQWNiuJxzA> >> <- Please read this first as buffer mapping depends on it. >> - NXT buffer mapping >> <https://docs.google.com/document/d/1HFzMMvDGHFtTgjNT0j-0SQ1fNU9R7woZ4JuNJdAXBjg> >> >> Cheers, >> >> Corentin >> > >
Received on Tuesday, 5 December 2017 23:47:48 UTC