- From: Corentin Wallez <cwallez@google.com>
- Date: Thu, 29 Jun 2017 13:34:54 -0400
- To: public-gpu@w3.org
- Message-ID: <CAGdfWNO+-ZwiE_QjOrcQqUyUNFLtromuPxa0u8d8W4DKphOwSg@mail.gmail.com>
GPU Web 2017-06-28
Chair: Corentin and Dean
Scribe: Dean (with some help)
Location: Google Hangout
Minutes from last meeting
<https://docs.google.com/document/d/1_vFF4VaLphkb9Zs5qCoYNvfgwnj8aq_l7E5U1LaXnXY/edit#>Tentative
agenda
-
Administrative stuff (if any)
-
Individual design and prototype status
-
Ongoing investigations
-
Encoders / command buffer constraints
-
Queues
-
Things we didn’t get to yet
-
Pipeline state
-
Render passes
-
Agenda for next meeting
Attendance
-
Chris Marrin (Apple)
-
Dean Jackson (Apple)
-
Myles C. Maxfield (Apple)
-
Theresa O'Connor (Apple)
-
Warren Moore (Apple)
-
Gareth Morgan (Axum Graphics)
-
Austin Eng (Google)
-
Corentin Wallez (Google)
-
Kai Ninomiya (Google)
-
Ken Russell (Google)
-
Aleksandar Stojilijkovic (Intel)
-
Ben Constable (Microsoft)
-
Chas Boyd (Microsoft)
-
Rafael Cintron (Microsoft)
-
Dzmitry Malyshau (Mozilla)
-
Jeff Gilbert (Mozilla)
-
Alex Kluge (Vizit Solutions)
-
Kirill Dmitrenko (Yandex)
-
Doug Twilleager (ZSpace)
-
Elviss Strazdiņš
-
Joshua Groves
Administrative items
Face to Face Meeting
-
CW: It seems that the best option is to host the F2F in Chicago. Would
people prefer it to be before or after the Khronos F2F?
-
CW: Khronos is 17-21 Sept.
-
JG: I would prefer the same week, either day before or day after WebGL.
-
CW: Ken, can you choose the day for WebGL?
-
KR: I will ask.
-
CW: OK, before or after?
-
DM: I’m not sure I can come to Chicago.
-
CW: Let’s pencil in Chicago, Friday 22nd September.
-
RC: I’d prefer it to be exactly before or after WebGL.
-
CW: We’ll ask the WebGL meeting to be on Thursday 21st.
-
KR: Yes. Hold that as a preliminary date.
TPAC
-
DJ: contacted TPAC, but CGs are supposed to only get two hours slots.
However we could have a joint meeting. Will update when we have an answer
from TPAC on whether we can have a full day meeting.
Legal discussion on contribution agreement
-
DJ: Apple lawyers talking to Google lawyers
-
RC: Microsoft lawyers involved
-
CW: others let us know if you want to add your lawyers to the mix
Individual design and prototype status
-
AE: (Google) D3D12 backend can look at texture models and stuff.
-
CW: end2end test working and you can test pixel values
-
CW: Just finished the investigation on RenderPasses. Up on Github.
-
DM: (Mozilla) No new milestones on our end.
-
Elviss: I’ve implemented a Metal backend for my game engine. This means
I’ll be able to give more input when discussing design issues. I’m working
on D3D12 next.
-
DJ: (Apple) no update for now, but got meetings with the Metal team to
understand its design better as well as try to get them to participate.
Encoders
-
CW: Last time we checked, we were waiting on feedback from Apple about
the cost of swapping between encoder types.
-
MM: Cost of ending a compute encoder is fairly high, unless it is
followed by a compute encoder. It’s fairly expensive to swap types, or to
end a RenderEncoder and start another RenderEncoder.
-
DJ: Basically Metal can coalesce compute and compute but that’s an
implementation detail. Otherwise starting/ending encoders is expensive.
-
MM: About queues in Metal vs. Vulkan / D3D12. The API level above the
hardware in Metal does not expose whether or not the queues are going to
the same hardware unit or different hardware units. Metal actually adds
synchronisation primitives if they are going to separate units.
-
CW: Let’s go back to queues later.
-
DM: Does Apple confirm that multiple queues may be sent in parallel to
different hardware units?
-
MM: Drivers are free to do that. But it is not explicit.
-
CW: So how do people feel about “now I’m going to do compute” signals?
-
MM: We have an idea about this, involving RenderPasses (from Vulkan). It
seems that they are a good match to a RenderEncoder in Metal.
-
CW: So you’d be able to interleave Compute between two RenderPasses?
This doesn’t work in Vulkan. Because …..
-
KN: Are you talking about compute between two renderpasses?
-
JG: The idea hinges on the fact that you have to put graphics stuff
inside renderpasses, but you don’t have to do that for compute. Is this
true?
-
CW: Vulkan can *only* do graphics inside RenderPasses. D3D can do
everything whenever it wants.
-
CW: Were you suggesting putting compute inside RenderPass?
-
MM: No, just that there is another type of Pass, or that the start of
the next RenderPass explicitly closes the virtual compute pass.
-
KN: Outside of Vulkan style renderpasses you can switch freely between
copy and compute. We could expose “begin end compute” outside of
renderpasses so the
-
CW: So effectively there would be begin/end for compute/graphics/copy,
and you’re required to be explicit?
-
DJ/MM: Yes.
-
CW: Anything we would need to add to beginCompute or beginBlit?
-
MM: Not in Metal.
-
CM: Sounds good. Does this sound ok for D3D12?
-
BC: From my perspective with D3D12, the synch is explicit everywhere. As
long as the queue types match, you use fences to synch, and it just goes to
the driver. So it sounds like these proposals would be easy to emulate, and
might just be no-ops in D3D12.
-
BC: Vk / MTL have a lot of stuff that looks for mobile so that the devs
know when tiles are flushed etc. Think that it makes sense to have them
explicit in WebGPU.
-
CW: NVIDIA and AMD now have some tiled GPUs. I expect this means D3D12
will eventually add support for this. Please make sure that our design maps
well to whatever Microsoft comes up with.
-
BC: Sure
-
CW: So it sounds like we have consensus on Encoders/Passes.
-
DJ: How do we document we have consensus.
-
CW: Definitely put it in the Github.
-
DJ: No need for a document yet.
-
DM: So the conclusion is a Metal-style encoders or something like it?
Can you do bindings outside the begin/end and have them persist over the
passes/blocks?
-
CW: I’m not sure what the reason for this is.
-
DM: ….
-
CW: For D3D12 you have to set the modes for compute and graphics. ???
-
CW: Not having inheritance seems like an ok thing, since both Metal and
D3D12 behave this way.
-
DJ: What is the benefit of this in Vulkan content today?
-
CW: RenderPasses are expensive enough that I don’t think it matters.
-
DM: We shouldn’t make it more expensive though. We can prototype it and
report back.
-
CW: We should ask about inheritance usage on the Khronos Gitlab.
Queues
-
CW: Do we expose one universal Q or one of each type, or a family of Qs?
-
DM: For Vk, I put data on gitlab.
-
DM: One of the major manufacturers has a graphics only queue, but it
isn’t exposed by their Vulkan driver yet.
-
CW: My understanding from the Vk discussion is that it is unlikely we
will want to expose a graphics-only queue to the Web.
-
CB: I think that the future will bring a reduction in the differences
between compute and graphics. So I’m not sure what people will be able to
do with the knowledge that they have a Q optimised just for graphics. I
think we could add it later if necessary, as some kind of flag.
-
CW: What about Qs that people use for async compute?
-
CB: Your average web developer only needs universal queues, can later
add a flag where we say “this queue will only do compute”. Transfer is
available everywhere.
-
MM: It is common to have compute only queues?
-
CW: Yes. Look at the Vulkan hardware DB, a few vendors have compute only
queues.
-
DM: You’re arguing for Qs that can do everything?
-
CB: Yes. I think it is a good place to start. And I think the future
will move in the direction of universal queues.
-
CW: Vulkan always promises at least one Q that can do everything.
-
CB: So it is a flag that helps the driver optimise?
-
CW: No. In Vulkan you ask the device for what Q types it supports, and
how many you have. When you create the device you specify how many you
want.
-
JG: Queues are allocated at device creation time in Vulkan whereas they
are a la carte on other APIs.
-
CB: I would push for the high-level abstraction.
-
JG: We’re trying to have a low-level capable platform and not an easy
thing
-
DM: how to map the higher level to Vulkan
-
JG: requires you allocate queues at device creation time, or virtualize
queues.
-
CW: Metal has to virtualize anyway.
-
DJ: What if we have only one queue? What would someone lose?
-
CW: Compute queue have been used to gain 10% to 15% by optimizing all
bottlenecks on the GPU at the same time.
-
MM: One way we could get to this is have the Q creation take a
dictionary, currently empty, but we could add the ability to define new Q
types later.
-
CB: Most HW doesn’t support async graphics unless you have multiple
GPUs. But most HW supports compute queues.
-
CW: Exposing hardware limitations is ok, but I think it is essential to
expose a similar level of functionality to all Web users.
-
CB: Is that that if the app has to check all platforms on all GPUs, then
it causes problems (CW: sorry this is badly transcribed)
-
DJ: Metal team mentioned that having something that works consistently
across all HW is a design goal.
-
DM:
-
CB: We can have application declare how many queues they will use and
emulate things on API / HW that doesn’t have the queues present.
-
CW: I agree. We can say that we always expose one generic Q and up to N
compute Qs.
-
DM: I don’t think it is possible to require even a single async compute
Q.
-
CB: THe user can’t really tell async vs. sync.
-
CW: If the application has the right semaphores betweens Qs, then GPU
sync devolves into serialization anyway. I don’t think there is too much
work to virtualise async compute into the generic Q type.
-
JG: Worry that we are papering a lot over the complexity and polyfilling
over stuff. A lot of the reason for the new APIs is that you know the
low-level stuff and tune to that.
-
DJ: True, but that’s what Metal does: go lower-level than OpenGL but
still expose enough low-level functionality to be useful. We now have 3+
years of experience suggesting that we made the right decision.
-
CB: What we’re losing by having queues declared is SLI support, but few
developers are using SLI.
-
AK: (missed)
-
CB: Don’t know of HW that has multiple HW graphics queues. Graphics is a
superset of compute which is a superset of transfer (in D3D, not Vulkan)
-
JG: I feel like we either need to rule out the existence of compute-only
Qs, or support them.
-
CB: I’m fine with ruling them out.
-
CW: We could allow some flags when you create a Q in WebGPU. The
implementation works out which Q to put it on, which could be the single Q
for devices that only have a universal Q. So rather than query, we give
hints up front.
-
JG: I’m fine with that. It’s not functionally different from the Vulkan
API.
-
JG: We haven’t decided where we are creating queues.
-
CW: I think the Vulkan restriction means we’ll have to create Qs at
device creation time.
-
CW: Proposal: when you create a device, you ask for Qs with flags
explaining what types of operations you’ll execute on them.
-
CB: Allocating the processing resources you plan to use.
-
MM: Returning multiple queues will be problematic because fences don’t
work across them.
-
CW: Can have only one backing universal queue in Metal and virtualize on
top of it.
-
MM: Could cause deadlocks but sounds fine for now.
-
CB: Need to be able to guarantee a minimum number of queue of each type
you can get.
-
BC: With textures 2kx2k is fine but 16kx16k depends on the hardware.
-
MM: How about 1 universal and 0 of others.
-
CW: What about applications that want to take async compute, but not
rework their application?
-
JG: I don’t think we should lie about async compute.
-
CB: I am fine with a minimum of 0.
-
CW: I am ok with that at this point in time.
-
DM: I think virtualization will be tricky if we emulate multiple queue
types then there could be synchronization problems and deadlocks…. (missed)
Agenda for next meeting
-
Queues
-
Render targets
-
Pipeline state if ready?
Received on Thursday, 29 June 2017 17:35:50 UTC