Re: Feasibility of low-level GPU access on the Web

For context, this is a reply to Dzmitry's Feasibility of low-level GPU
access on the Web
<http://kvark.github.io/web/gpu/2018/02/10/low-level-gpu-web.html> in
slightly less public forum.

Dzmitry,

First of all, congrats on reaching #1 on Hacker News with this post :) It
is an interesting view of your thoughts on the subjects. Below are a bunch
of comments.

*APIs as local maxima of a design space*

Vulkan isn't exactly a local maxima, see the various KHR_maintenance
extensions and for example loadOp/storeOp being in the
VkRenderPassCreateInfo instead of the VkRenderPassBeginInfo. That said it
is pretty close to a local maximum for what it tries to achieve: providing
developers with console-like access to the hardware, over a broad range of
GPUs.

Itself Metal is a local maximum (or close to it) for an easy to use API
that's extremely efficient on Apple GPUs and runs well on AMD/Intel/Nvidia
GPUs.

NXT's goals are different as it tries to be portable, efficient, and
unbiased over D3D12, Metal and Vulkan. In addition it tries to be efficient
to validate by design and where possible not expose too many flags to
application developers. That's a lot of constraints and NXT is the first
step of the gradient decent that would lead to a local maximum for these
goals. It has had maybe less than 1-engineer-year of work overall so of
course it isn't as refined as the native APIs.

To keep the mathematical metaphor, the cost function used to evaluate the
APIs depends on what your goals are. D3D12. Metal and Vulkan are local
maxima for their respective cost functions but none of them are
particularly good when evaluated with the WebGPU cost function. For example
this is because WebGPU needs to be unbiased towards native APIs.

*Being unbiased is a key property of WebGPU*

Being unbiased is key here. The reason we are at W3C is so that we can have
all browser vendors at the table to design WebGPU. It turns out that this
makes the owners, or major stakeholders, of all native APIs discuss what
WebGPU should look like. Obviously if someone suggests something too biased
towards on native API, it will get push back from the vendors of the other
native APIs. We'll all have to do an exercise in compromising if we want to
ship a WebGPU v1.

Being unbiased isn't only important for political reasons. We want the
translation from WebGPU to the native APIs to be as thin as possible to
avoid expensive emulation. Each native API has features that would be
expensive to emulate on others. Examples are D3D12's descriptor heaps,
Metal's implicit barriers and Vulkan's memory aliasing (esp. for host
visible resources).

The set of unbiased designs is far from the existing native APIs and to
keep on the mathematical metaphors, it would probably be somewhere around
the barycenter of them. A WebVulkan would be far from the barycenter and
that's why we don't believe it can happen.

Google recognized this "unbiased" constraint early on and presented NXT as
a first approximation of where the barycenter is. This was in January 2017
even before the W3C group was started. We now feel this design was a step
in the good direction because we have backends on all native APIs (Vulkan
is being completed now) that are all surprisingly thin, each being 2-4 kloc.

That said we still think of NXT as just a prototype of what a WebGPU design
could be and do not want it to become WebGPU as is. It's just an API that
efficient-ish, useable-ish and close-ish to the barycenter of the native
APIs. So a good first step of gradient descent but that's it. We'll upload
IDL for our "sketch API" shortly.

*Performance measurement of memory barriers*

One of the concerns we have with implicit memory barriers is that a WebGPU
library on top of D3D12 / Vulkan doesn't have the same knowledge of the
hardware that a Metal driver does. The exact layout of hardware caches,
required metadata and other things are hidden behind the D3D12/Vulkan
interface. This is an issue if for example the UBO and texture caches are
the same: Metal would know that after discarding one it doesn't need to
discard the other but WebGPU on D3D12/Vulkan wouldn't and would discard
twice. Hardware layout differences like this can even happen on hardware by
the same vendor: Nvidia Kepler has split caches like in the example, but
Maxwell unified them.

So I think the tests done should be what you said but on Windows the
application should be written with a Metal-like API translated to
D3D12/Vulkan, and on OSX it should be a application using a Vulkan-like
interface that gets translated to Metal.

> *G*: let’s only share resources with immutable access flags then
I'm not sure what that means but you can "transition the usage" of a
resource after moving it to a queue.

*Conclusion*

Yes we are suggesting the group makes something different from existing
APIs which would make it a 4th API. Hopefully you now agree with our
reasoning as described above, or at least understand it. We are not
discarding existing research and designs, but using them to make something
fitting the needs of the Web.

Received on Monday, 12 February 2018 19:42:31 UTC