Re: Feasibility of low-level GPU access on the Web from Dzmitry Malyshau on 2018-02-13 (public-gpu@w3.org from February 2018)

From: Dzmitry Malyshau <dmalyshau@mozilla.com>
Date: Tue, 13 Feb 2018 13:42:06 -0500
To: Corentin Wallez <cwallez@google.com>
Cc: public-gpu <public-gpu@w3.org>
Message-ID: <CAHnMvn+Ep3K-As4VQ8gfA2Skae-3eOvh3Q1dFkeTC-rb=g1A_g@mail.gmail.com>
Hi Corentin,

Thanks for an extensive answer!

To clarify, I'm surprised how popular this post showed up in HN, and it
wasn't mean to side-step any discussions within the group.
It's just a reflection on how bad I think our (the WG) situation is,
sharing with a wider community the constraints and problems we try to
solve, including a bit of analysis of potential solutions.
Fortunately, no flame wars were started, and no fingers pointed. HN
audience expressed a wide range of opinions, not converging to any
particular. I found interesting how many people were saying (or implying)
that WebGPU is not needed at all, or that they only need compute support
(which is a rather weak excuse for a whole new API).

> Vulkan isn't exactly a local maxima

There is a ton to improve about Vulkan for sure. We are analyzing the list
of typical complaints at this very moment :) A lot of people are looking
into it, and it's getting better.

> This is an issue if for example the UBO and texture caches are the same:
Metal would know that after discarding one it doesn't need to discard the
other but WebGPU on D3D12/Vulkan wouldn't and would discard twice.

To play a devil's advocate, discarding caches twice doesn't sound like the
end of the world to me :)

I recall that Myles was working on Metal -> Vulkan translation layer a
while ago, with one of the goals to see how difficult it is. Has it reached
a form where we can run and benchmark anything yet?

>> *G*: let’s only share resources with immutable access flags then
> I'm not sure what that means but you can "transition the usage" of a
resource after moving it to a queue.

If I recall NXT design correctly, a resource state follows D3D12 philosophy
of being in either a single mutable state or a combination of read-only
states, and a similar rule applies to queue ownership: either a single
queue owns it and mutates it, or multiple queues can read from it. Is this
not the case? That's what I meant, at least.

> Yes we are suggesting the group makes something different from existing
APIs which would make it a 4th API. Hopefully you now agree with our
reasoning as described above, or at least understand it. We are not
discarding existing research and designs, but using them to make something
fitting the needs of the Web.

I do understand your position, and in fact I don't believe your
clarifications contradict the post contents. Minus the vague "discarding
existing research" aspect, of course :)

Thank you,
Dzmitry

On Mon, Feb 12, 2018 at 2:41 PM, Corentin Wallez <cwallez@google.com> wrote:

> For context, this is a reply to Dzmitry's Feasibility of low-level GPU
> access on the Web
> <http://kvark.github.io/web/gpu/2018/02/10/low-level-gpu-web.html> in
> slightly less public forum.
>
> Dzmitry,
>
> First of all, congrats on reaching #1 on Hacker News with this post :) It
> is an interesting view of your thoughts on the subjects. Below are a bunch
> of comments.
>
> *APIs as local maxima of a design space*
>
> Vulkan isn't exactly a local maxima, see the various KHR_maintenance
> extensions and for example loadOp/storeOp being in the
> VkRenderPassCreateInfo instead of the VkRenderPassBeginInfo. That said it
> is pretty close to a local maximum for what it tries to achieve: providing
> developers with console-like access to the hardware, over a broad range of
> GPUs.
>
> Itself Metal is a local maximum (or close to it) for an easy to use API
> that's extremely efficient on Apple GPUs and runs well on AMD/Intel/Nvidia
> GPUs.
>
> NXT's goals are different as it tries to be portable, efficient, and
> unbiased over D3D12, Metal and Vulkan. In addition it tries to be efficient
> to validate by design and where possible not expose too many flags to
> application developers. That's a lot of constraints and NXT is the first
> step of the gradient decent that would lead to a local maximum for these
> goals. It has had maybe less than 1-engineer-year of work overall so of
> course it isn't as refined as the native APIs.
>
> To keep the mathematical metaphor, the cost function used to evaluate the
> APIs depends on what your goals are. D3D12. Metal and Vulkan are local
> maxima for their respective cost functions but none of them are
> particularly good when evaluated with the WebGPU cost function. For example
> this is because WebGPU needs to be unbiased towards native APIs.
>
> *Being unbiased is a key property of WebGPU*
>
> Being unbiased is key here. The reason we are at W3C is so that we can
> have all browser vendors at the table to design WebGPU. It turns out that
> this makes the owners, or major stakeholders, of all native APIs discuss
> what WebGPU should look like. Obviously if someone suggests something too
> biased towards on native API, it will get push back from the vendors of the
> other native APIs. We'll all have to do an exercise in compromising if we
> want to ship a WebGPU v1.
>
> Being unbiased isn't only important for political reasons. We want the
> translation from WebGPU to the native APIs to be as thin as possible to
> avoid expensive emulation. Each native API has features that would be
> expensive to emulate on others. Examples are D3D12's descriptor heaps,
> Metal's implicit barriers and Vulkan's memory aliasing (esp. for host
> visible resources).
>
> The set of unbiased designs is far from the existing native APIs and to
> keep on the mathematical metaphors, it would probably be somewhere around
> the barycenter of them. A WebVulkan would be far from the barycenter and
> that's why we don't believe it can happen.
>
> Google recognized this "unbiased" constraint early on and presented NXT as
> a first approximation of where the barycenter is. This was in January 2017
> even before the W3C group was started. We now feel this design was a step
> in the good direction because we have backends on all native APIs (Vulkan
> is being completed now) that are all surprisingly thin, each being 2-4 kloc.
>
> That said we still think of NXT as just a prototype of what a WebGPU
> design could be and do not want it to become WebGPU as is. It's just an API
> that efficient-ish, useable-ish and close-ish to the barycenter of the
> native APIs. So a good first step of gradient descent but that's it. We'll
> upload IDL for our "sketch API" shortly.
>
> *Performance measurement of memory barriers*
>
> One of the concerns we have with implicit memory barriers is that a WebGPU
> library on top of D3D12 / Vulkan doesn't have the same knowledge of the
> hardware that a Metal driver does. The exact layout of hardware caches,
> required metadata and other things are hidden behind the D3D12/Vulkan
> interface. This is an issue if for example the UBO and texture caches are
> the same: Metal would know that after discarding one it doesn't need to
> discard the other but WebGPU on D3D12/Vulkan wouldn't and would discard
> twice. Hardware layout differences like this can even happen on hardware by
> the same vendor: Nvidia Kepler has split caches like in the example, but
> Maxwell unified them.
>
> So I think the tests done should be what you said but on Windows the
> application should be written with a Metal-like API translated to
> D3D12/Vulkan, and on OSX it should be a application using a Vulkan-like
> interface that gets translated to Metal.
>
> > *G*: let’s only share resources with immutable access flags then
> I'm not sure what that means but you can "transition the usage" of a
> resource after moving it to a queue.
>
> *Conclusion*
>
> Yes we are suggesting the group makes something different from existing
> APIs which would make it a 4th API. Hopefully you now agree with our
> reasoning as described above, or at least understand it. We are not
> discarding existing research and designs, but using them to make something
> fitting the needs of the Web.
>
Received on Tuesday, 13 February 2018 18:48:10 UTC