Minutes for the 2017-07-12 meeting from Corentin Wallez on 2017-07-13 (public-gpu@w3.org from July 2017)

From: Corentin Wallez <cwallez@google.com>
Date: Thu, 13 Jul 2017 14:03:59 -0400
To: public-gpu@w3.org
Message-ID: <CAGdfWNP3dVMZLyNexEsr=Bun1nmhJkw5X7BNdv4zPjLBTCt13A@mail.gmail.com>

GPU Web 2017-07-12

Chair: Corentin and Dean

Scribe: Dean (with some help)

Location: Google Hangout
Minutes from last meeting
<https://docs.google.com/document/d/1iqrWz9-Oo7mCfZCamzDhZE27p6wFTeLI29WirDLcEHs/edit#heading=h.hp3f2zbslxr9>Tentative
agenda

Administrative stuff (if any)

Individual design and prototype status

Ongoing investigations
-

Queues
-

Things we didn’t get to yet
-

Pipeline state
-

https://github.com/jdashg/vulkan-portability/blob/master/pipeline-state.md
-

Render passes
-

Agenda for next meeting

Attendance

Chris Marrin (Apple)

Dean Jackson (Apple)
-

Jason Aftosmis (Apple)
-

Julien Chaintron (Apple)
-

Adrian Lindberg (Apple)
-

Myles C. Maxfield (Apple)
-

Warren Moore (Apple)
-

Austin Eng (Google)
-

Corentin Wallez (Google)
-

Kai Ninomiya (Google)
-

Ken Russell (Google)
-

Daniel Johnston (Intel)
-

Ben Constable (Microsoft)
-

Chas Boyd (Microsoft)
-

Rafael Cintron (Microsoft)
-

Dzmitry Malyshau (Mozilla)
-

Jeff Gilbert (Mozilla)
-

Kirill Dmitrenko (Yandex)
-

Doug Twilleager (ZSpace)
-

Elviss Strazdiņš
-

Joshua Groves
-

Tyler Larson

Administrative items

We’ll do our F2F on Friday the 22nd of September, in Chicago at the
Google offices.
-

Corentin will ask on the mailing list for a list of attendees.
-

DJ: Checking with the W3C again, about TPAC F2F
-

DJ: Talked with WASM chair that is keen about having a chat, could be
just a few of us going.

Individual design and prototype status

Google - more backend stuff for our backend. E.g. Render Targets, etc.
Most things are working. We ran into a D3D12 buffer to texture copy,
regarding alignment. We’ll record this issue and talk about it when we get
to copies.
-

JG: I looked at Pipeline states, in particular comparing D3D and Vulkan.
-

DM: Good progress on the graphics abstraction layer with Vulkan up to
par with D3D and Metal. We have a textured quad rendering on screen. Ran
into alignment constraints. Have reached out
<https://gitlab.khronos.org/vulkan/vulkan/issues/920> to Khronos group
for specifying the texture to buffer constraint of alignment, allowing our
implementation to say it doesn’t support this feature.
-

BC: So Vulkan has a value for the optimal alignment?
-

DM: Yes it is “optimal alignment” for slice and rows of the images. But
also support packed buffers.
-

BC: My suspicion is that on a GPU that requires an alignment, … My
conjecture is that there is restriction on certain hardware that requires
this alignment, and you might have to do another copy if you can’t support
it directly. I suspect we’ll run into cases like this a lot. Previous APIs
abstracted away the hardware operations to be consistent, but newer APIs
have stripped that away, and will expose situations like this. I’m trying
to nail down the limitations with the D3D team. There is an error mode I’d
like to avoid: if we ignore alignment constraints, then the backend for
D3D12 will require intermediate copies, which will slow it down. Meanwhile
if Vulkan has a preferred alignment, it will probably do extra work if you
don’t give it the right alignment. These might introduce performance
penalties. My open question is whether or not WebASM will require aligned
allocations in order for us to get good performance.
-

BC: Avoiding the copies here is one of the reasons why the new APIs are
more efficient.
-

CW: I agree with the analysis. Can be addressed in two places: 1. The
WebGPU API, or 2. We emulate stuff. I see some open source drivers using
compute shaders for the buffer copies, or emulation. So I think we should
just add the D3D constraints and guarantee that no expensive copies take
place.
-

JG: +1 from me.
-

DM: I’m not sure Vulkan does drop back to compute shaders.
-

CW: Check out blorp
<https://github.com/mesa3d/mesa/tree/master/src/intel/blorp> from Intel
Mesa driver….. Which does have a lot of shader code.
-

DM: So no DMA transfer. They have to use a graphics path for the copy.
-

CW: I think we should embed the constraints of D3D into WebGPU. Let’s
create an issue and discuss it more.
-

BC: D3D12 was designed to transparently expose the hardware via the API.
We don’t think of it as a API limitation, but a hardware issue. Doing extra
copies in some cases, or requiring the API to be aligned. 3rd party
libraries seem to understand this and use intermediates themselves if
necessary.
-

CW: We agree with this. So does Mozilla. What about Metal?
-

DM: Metal doesn’t have this limitation.
-

MM: Metal accepts anything and translates if necessary.
-

CW: OK. Let’s discuss on github. We generally agree about exposing the
hardware constraints.
-

MM: Wait. If 2 of the 3 API don’t have the constraints….
-

CW: Either the driver or the implementation will have to do extra
copies… so we should expose the limitations so the application can be smart.
-

CW: Vulkan doesn’t have the limitation, but it does suggest an optimal
alignment.
-

MM: So ArrayBuffer will have a constructor to handle alignment.
-

CW: It’s unclear. Maybe ArrayBuffers won’t change, but the
implementation will have to do that under the hood.
-

KR: This is about GPU to GPU copies?
-

DM: It’s about GPU visible memory.
-

KR: We don’t need new array buffer constructors, because the memory they
will show if they map buffer memory directly will be aligned by the driver
itself.
-

DM: To clarify we are talking about the row pitch alignment, and image
slice alignment. The application can create its buffer to match this
constraint.
-

KR: The API would exposing some kind of query for this alignment.
-

CW: The API will guarantee at most some limits. Something queryable.
-

Kirill: This might also apply to HTMLImage objects.
-

CW: Either the image has already been decoded in GPU memory, which will
be a texture to texture copy, and so it isn’t an issue. If it hasn’t been
uploaded yet, then the implementation will have to do the correct thing. So
it isn’t a problem.
-

Kirill: What about ImageBitmap, which might be a raw buffer that’s
already decoded. The browser doesn’t know if it is designed to go to WebGPU.
-

Kirill: ImageBitmap at least exists in Firefox and Chrome.
-

CW: Discussion of DOM facing features can come later.

Queues

CW: Last meeting we had consensus to expose async compute if available,
and that queues should be requested at device creation time.
-

DM: We might want to know how many queues are available before setup.
-

CW: We should raise this in the github issue. #22
-

CW: We have consensus on most things there.

Pipeline state

https://github.com/jdashg/vulkan-portability/blob/master/pipeline-state.md

JG: Approach: inlined all the pipeline state of all three APIs to remove
all nesting of structures, so that things are easier to see. Vulkan and
D3D12 are pretty similar in what they contain and how they contain it.
Metal has smaller descriptors that are less verbose, and with less “stuff”
in it. No detail in what’s missing / in a different place in Metal.
-

JG: Some things are always dynamic and set in command lists in D3D12
(viewport scissor) and in Vulkan they are either pipeline state or set as
“dynamic” then specified in the command buffer.
-

CW: These dynamic state things in Vulkan are for very old mobile GPUs,
propose we set everything as dynamic.
-

JG: What I was going to propose too. Then the doc goes “state by state”,
things are very similar, with Metal being less constrained in general.
-

JG: Viewport section: D3D12 is always using dynamic states, propose to
make it always dynamic. Think about making things static in pipeline state
after MVP and see with Vulkan WG.
-

JG: Render target formats: In D3D12 and Metal they are straightforward:
array of color format, and a depth-stencil format. In Vulkan the pipeline
gets a VkRenderpass and can only be used with “compatible” renderpasses.
Renderpass compatibility includes the attachment formats. Basically the
renderpass in the pipeline creation info is to pass the format of the
attachments.
-

DM: So saying that we can set the format directly with a “dummy” pass to
give the render target formats.
-

CW: I believe that renderpass compat also includes input attachments
-

JG: If you want to add pull request to update stuff etc. Please do! This
is to collaborate.
-

RC: Where is the doc?
-

JG: Linked in the doc.
-

And here:
https://github.com/jdashg/vulkan-portability/blob/master/pipeline-state.md
-

CW: On our side, we’ve added depth stencil state and found a incompat
with D3D that doesn’t have per-face stencil state for some state, etc. Our
WiP talks about this - we’ll either add it to your document or raise an
issue.
-

CW: Should we raise an issue per state? Or add to the document?
-

JG: I think a github issue. But a PR also works.
-

JG: Depends if you want to add to the investigation or actually start a
discussion.
-

CW: We’ll probably do a PR on your doc then.
-

CW: Our proposal is to take the intersection, and describes that.

Render passes

CW: Quick summary from github. All three APIs do something slightly
different. D3D just allows the render targets to be set to image queues. In
Metal you specify them when you create a render command encoder. For each
attachment you also specify a load or store flag. Vulkan is similar to
Metal (render sub passes) - you can ask tiled GPUs to keep tile data in
memory between passes. It’s the only API with this concept.
-

CW: My feeling is that render passes are amazing and should be in
WebGPU. What do people think?
-

DM: I agree, as noted in issue #23.
-

JG: My theory is that D3D doesn’t have Render Passes because it doesn’t
always run on tiled GPUs. Is this incorrect?
-

BC: You are correct. No hardware need for RenderPasses.
-

MM: What about Windows 10 on ARM?
-

BC: I’m not sure what I can say without checking what is public.
Understand that passes have advantages on tiler, for the MVP do we want to
add complexity for this?
-

JG: anticipate that if we put renderpasses on WebGPU, then on backend
without you’d almost ignore them but is an architecture you have to fit in
to work everywhere. There’d be no overhead in the translation, a bit of
overhead for the application.
-

CW: Vulkan render passes also encode some information on memory
barriers. It does seem that D3D will target GPUs that will have some form
of tiling. So I suspect D3D will go in this direction in some form.
-

CW: So, RenderPasses are useful for most systems, are future proof, and
easy enough to simulate.
-

RC: So how do you expect to break up command lists on D3D12? Just one
giant command list?
-

MM: We can’t inherit state between render passes/encoders.
-

CW: Believe setting or clearing state at the render pass boundary will
be cheap compared to the GPU cost of flushing the output merger etc.
-

CW: In our current prototype, we do one giant command list for the
command buffer. Multiple passes are in the same graphics command list.
-

RC: Makes sense for subpasses. I guess it is an implementation detail.
-

RC: Inheritance can be left out of V1.
-

CW: Inheritance is orthogonal. Switching render targets is a different
question.
-

RC: Just means you have to keep some state around. It’s a hidden cost.
-

CW: This was discussed in one of the github issues. I don’t have a
strong opinion.
-

DM: I argued in favor of inheritance.
-

MM: The issue is symmetrical where on some backends, if there is
inheritance, we’ll have to keep state around and apply it if necessary.
Symmetrically D3D12 we’d have to clear state in place.
-

BC: Is it faster to clear or set? Is there a way to measure
-

CW: I think the inheritance in Vulkan is a footnote in the API. We
should defer until we have a prototype.
-

CW: Are we going to do immediate RenderPasses, like Metal, or sub passes
like Vulkan.
-

MM: We haven’t had anyone argue against?
-

CW: We still need to choose between the Metal and Vulkan styles. The
Vulkan approach is more complex.
-

DJ: Think we should stick with the simple approach for the prototype.
-

MM: Describe “simple”
-

DJ: The metal-style things
-

MM: I think Dean means that the dependency data-structure has memory
barriers for synchronization.
-

CW: And to keep data in tile-memory.
-

MM: How do we feel about undefined behaviour? What happens if the
programmer gets this data structure wrong?
-

CW: We don’t have time to discuss that now. Let’s defer it to next week.

CW: Please check out my slides on Vulkan render passes. It is in the github
issue on the topic. It describes the benefit for tiled GPUs. We’ll talk
about it next week.
Agenda for next meeting

Keep talking about renderpasses / rendertarget
-

Get back on discussions related to pipeline state.

Received on Thursday, 13 July 2017 18:04:51 UTC