Re: Use cases for synchronization

Hey all,

Another example I'd like to add to the list is something that would happen
when rendering shadow maps:

   - "Blit" pass writing to buffers B1, B2, ...
   - Render pass 1 reading from B1 as a uniform buffer
   - Render pass 2 reading from B2 as a uniform buffer
   - ...

It causes problems on the D3D12 and Vulkan backends if we choose an
"implicit" model: the calls done on the backing API would become the
following:

   - "Blit" pass writing to buffers B1 B2, ...
   - Transition of B1 from copy-write to UBO (for D3D12, and the equivalent
   buffer barrier in Vulkan)
   - Render pass 1 reading from B1 as a uniform buffer
   - Transition of B2 from copy-write to UBO
   - Render pass 2 reading from B2 as a uniform buffer

The problem is for hardware out there (Intel, AMD at least) such buffer
transitions are translated to global memory barriers by the D3D12 or Vulkan
driver. While one global memory barrier would be enough, the code above
produces N. On the other hand a Metal driver receiving the same commands
knows that it emits global memory barriers, and incurs the cost of only one.

Now if the application gives WebGPU the following:

   - "Blit" pass writing to buffers B1, B2, ...
   - Transition B1, B2, ... to UBO
   - Render pass 1 reading from B1 as a uniform buffer
   - Render pass 2 reading from B2 as a uniform buffer
   - ...

Then all backends are able to generate only one global memory barrier.

Hopefully this convinces you that if we choose an implicit model, we would
be leaving GPU performance on the table on the D3D12 and Vulkan backends.

Corentin

On Mon, Oct 2, 2017 at 6:02 PM, Corentin Wallez <cwallez@google.com> wrote:

> Hey all,
>
> During the memory barriers discussion we said it would be nice to look at
> specific examples, and how they map to different APIs. The Vulkan WG has
> made a small list of examples
> <https://github.com/KhronosGroup/Vulkan-Docs/wiki/Synchronization-Examples>
> to help developers do synchronization when porting to Vulkan. They show
> real-world things developers want to do, and how it maps to Vulkan on a
> single queue:
>
>    - Compute-to-Compute
>       - *First dispatch writes to buffer, second reads from it*
>       - First dispatch reads from a buffer, second writes to it
>       - Same for images
>       - First and second dispatches write to non-overlapping regions of
>       same buffer and third dispatch reads from both regions.
>       - First and second dispatches write to two buffers, third dispatch
>       reads from both buffers.
>    - Compute-to-Graphics
>       - *Dispatch writes to buffer, draw reads from it as index buffer.*
>       - Dispatch writes to buffer, draw reads from it as index buffer and
>       other dispatch reads as uniform buffer.
>       - Dispatch writes to buffer, draw reads from it as indirect buffer.
>       - *Dispatch writes to image, draw samples image.*
>       - Dispatch writes to texel buffer, draw reads from it as indirect
>       and uniform buffer
>    - Graphics-to-Compute
>       - *Draw writes to color / depth attachment, dispatch reads from
>       image.*
>    - Graphics-to-Graphics
>       - *First draw writes depth / color attachment, second draw samples
>       as input attachment (i.e. from tile memory)*
>       - First draw writes depth / color attachment, second samples in
>       fragment shader.
>       - First draw samples texture, second draw uses texture as color
>       attachment.
>
> They don't go in as much detail for multi-queue stuff, but I believe a lot
> of the example with both graphics and compute would be interesting to map
> to multiple queues too.
>
> In bold are the use-cases that I believe are important / a good sample and
> we could focus on. The page linked already has the examples for Vulkan.
>
> Cheers,
>
> Corentin
>

Received on Wednesday, 18 October 2017 19:37:38 UTC