- From: Corentin Wallez <cwallez@google.com>
- Date: Thu, 29 Jun 2017 13:34:54 -0400
- To: public-gpu@w3.org
- Message-ID: <CAGdfWNO+-ZwiE_QjOrcQqUyUNFLtromuPxa0u8d8W4DKphOwSg@mail.gmail.com>
GPU Web 2017-06-28 Chair: Corentin and Dean Scribe: Dean (with some help) Location: Google Hangout Minutes from last meeting <https://docs.google.com/document/d/1_vFF4VaLphkb9Zs5qCoYNvfgwnj8aq_l7E5U1LaXnXY/edit#>Tentative agenda - Administrative stuff (if any) - Individual design and prototype status - Ongoing investigations - Encoders / command buffer constraints - Queues - Things we didn’t get to yet - Pipeline state - Render passes - Agenda for next meeting Attendance - Chris Marrin (Apple) - Dean Jackson (Apple) - Myles C. Maxfield (Apple) - Theresa O'Connor (Apple) - Warren Moore (Apple) - Gareth Morgan (Axum Graphics) - Austin Eng (Google) - Corentin Wallez (Google) - Kai Ninomiya (Google) - Ken Russell (Google) - Aleksandar Stojilijkovic (Intel) - Ben Constable (Microsoft) - Chas Boyd (Microsoft) - Rafael Cintron (Microsoft) - Dzmitry Malyshau (Mozilla) - Jeff Gilbert (Mozilla) - Alex Kluge (Vizit Solutions) - Kirill Dmitrenko (Yandex) - Doug Twilleager (ZSpace) - Elviss Strazdiņš - Joshua Groves Administrative items Face to Face Meeting - CW: It seems that the best option is to host the F2F in Chicago. Would people prefer it to be before or after the Khronos F2F? - CW: Khronos is 17-21 Sept. - JG: I would prefer the same week, either day before or day after WebGL. - CW: Ken, can you choose the day for WebGL? - KR: I will ask. - CW: OK, before or after? - DM: I’m not sure I can come to Chicago. - CW: Let’s pencil in Chicago, Friday 22nd September. - RC: I’d prefer it to be exactly before or after WebGL. - CW: We’ll ask the WebGL meeting to be on Thursday 21st. - KR: Yes. Hold that as a preliminary date. TPAC - DJ: contacted TPAC, but CGs are supposed to only get two hours slots. However we could have a joint meeting. Will update when we have an answer from TPAC on whether we can have a full day meeting. Legal discussion on contribution agreement - DJ: Apple lawyers talking to Google lawyers - RC: Microsoft lawyers involved - CW: others let us know if you want to add your lawyers to the mix Individual design and prototype status - AE: (Google) D3D12 backend can look at texture models and stuff. - CW: end2end test working and you can test pixel values - CW: Just finished the investigation on RenderPasses. Up on Github. - DM: (Mozilla) No new milestones on our end. - Elviss: I’ve implemented a Metal backend for my game engine. This means I’ll be able to give more input when discussing design issues. I’m working on D3D12 next. - DJ: (Apple) no update for now, but got meetings with the Metal team to understand its design better as well as try to get them to participate. Encoders - CW: Last time we checked, we were waiting on feedback from Apple about the cost of swapping between encoder types. - MM: Cost of ending a compute encoder is fairly high, unless it is followed by a compute encoder. It’s fairly expensive to swap types, or to end a RenderEncoder and start another RenderEncoder. - DJ: Basically Metal can coalesce compute and compute but that’s an implementation detail. Otherwise starting/ending encoders is expensive. - MM: About queues in Metal vs. Vulkan / D3D12. The API level above the hardware in Metal does not expose whether or not the queues are going to the same hardware unit or different hardware units. Metal actually adds synchronisation primitives if they are going to separate units. - CW: Let’s go back to queues later. - DM: Does Apple confirm that multiple queues may be sent in parallel to different hardware units? - MM: Drivers are free to do that. But it is not explicit. - CW: So how do people feel about “now I’m going to do compute” signals? - MM: We have an idea about this, involving RenderPasses (from Vulkan). It seems that they are a good match to a RenderEncoder in Metal. - CW: So you’d be able to interleave Compute between two RenderPasses? This doesn’t work in Vulkan. Because ….. - KN: Are you talking about compute between two renderpasses? - JG: The idea hinges on the fact that you have to put graphics stuff inside renderpasses, but you don’t have to do that for compute. Is this true? - CW: Vulkan can *only* do graphics inside RenderPasses. D3D can do everything whenever it wants. - CW: Were you suggesting putting compute inside RenderPass? - MM: No, just that there is another type of Pass, or that the start of the next RenderPass explicitly closes the virtual compute pass. - KN: Outside of Vulkan style renderpasses you can switch freely between copy and compute. We could expose “begin end compute” outside of renderpasses so the - CW: So effectively there would be begin/end for compute/graphics/copy, and you’re required to be explicit? - DJ/MM: Yes. - CW: Anything we would need to add to beginCompute or beginBlit? - MM: Not in Metal. - CM: Sounds good. Does this sound ok for D3D12? - BC: From my perspective with D3D12, the synch is explicit everywhere. As long as the queue types match, you use fences to synch, and it just goes to the driver. So it sounds like these proposals would be easy to emulate, and might just be no-ops in D3D12. - BC: Vk / MTL have a lot of stuff that looks for mobile so that the devs know when tiles are flushed etc. Think that it makes sense to have them explicit in WebGPU. - CW: NVIDIA and AMD now have some tiled GPUs. I expect this means D3D12 will eventually add support for this. Please make sure that our design maps well to whatever Microsoft comes up with. - BC: Sure - CW: So it sounds like we have consensus on Encoders/Passes. - DJ: How do we document we have consensus. - CW: Definitely put it in the Github. - DJ: No need for a document yet. - DM: So the conclusion is a Metal-style encoders or something like it? Can you do bindings outside the begin/end and have them persist over the passes/blocks? - CW: I’m not sure what the reason for this is. - DM: …. - CW: For D3D12 you have to set the modes for compute and graphics. ??? - CW: Not having inheritance seems like an ok thing, since both Metal and D3D12 behave this way. - DJ: What is the benefit of this in Vulkan content today? - CW: RenderPasses are expensive enough that I don’t think it matters. - DM: We shouldn’t make it more expensive though. We can prototype it and report back. - CW: We should ask about inheritance usage on the Khronos Gitlab. Queues - CW: Do we expose one universal Q or one of each type, or a family of Qs? - DM: For Vk, I put data on gitlab. - DM: One of the major manufacturers has a graphics only queue, but it isn’t exposed by their Vulkan driver yet. - CW: My understanding from the Vk discussion is that it is unlikely we will want to expose a graphics-only queue to the Web. - CB: I think that the future will bring a reduction in the differences between compute and graphics. So I’m not sure what people will be able to do with the knowledge that they have a Q optimised just for graphics. I think we could add it later if necessary, as some kind of flag. - CW: What about Qs that people use for async compute? - CB: Your average web developer only needs universal queues, can later add a flag where we say “this queue will only do compute”. Transfer is available everywhere. - MM: It is common to have compute only queues? - CW: Yes. Look at the Vulkan hardware DB, a few vendors have compute only queues. - DM: You’re arguing for Qs that can do everything? - CB: Yes. I think it is a good place to start. And I think the future will move in the direction of universal queues. - CW: Vulkan always promises at least one Q that can do everything. - CB: So it is a flag that helps the driver optimise? - CW: No. In Vulkan you ask the device for what Q types it supports, and how many you have. When you create the device you specify how many you want. - JG: Queues are allocated at device creation time in Vulkan whereas they are a la carte on other APIs. - CB: I would push for the high-level abstraction. - JG: We’re trying to have a low-level capable platform and not an easy thing - DM: how to map the higher level to Vulkan - JG: requires you allocate queues at device creation time, or virtualize queues. - CW: Metal has to virtualize anyway. - DJ: What if we have only one queue? What would someone lose? - CW: Compute queue have been used to gain 10% to 15% by optimizing all bottlenecks on the GPU at the same time. - MM: One way we could get to this is have the Q creation take a dictionary, currently empty, but we could add the ability to define new Q types later. - CB: Most HW doesn’t support async graphics unless you have multiple GPUs. But most HW supports compute queues. - CW: Exposing hardware limitations is ok, but I think it is essential to expose a similar level of functionality to all Web users. - CB: Is that that if the app has to check all platforms on all GPUs, then it causes problems (CW: sorry this is badly transcribed) - DJ: Metal team mentioned that having something that works consistently across all HW is a design goal. - DM: - CB: We can have application declare how many queues they will use and emulate things on API / HW that doesn’t have the queues present. - CW: I agree. We can say that we always expose one generic Q and up to N compute Qs. - DM: I don’t think it is possible to require even a single async compute Q. - CB: THe user can’t really tell async vs. sync. - CW: If the application has the right semaphores betweens Qs, then GPU sync devolves into serialization anyway. I don’t think there is too much work to virtualise async compute into the generic Q type. - JG: Worry that we are papering a lot over the complexity and polyfilling over stuff. A lot of the reason for the new APIs is that you know the low-level stuff and tune to that. - DJ: True, but that’s what Metal does: go lower-level than OpenGL but still expose enough low-level functionality to be useful. We now have 3+ years of experience suggesting that we made the right decision. - CB: What we’re losing by having queues declared is SLI support, but few developers are using SLI. - AK: (missed) - CB: Don’t know of HW that has multiple HW graphics queues. Graphics is a superset of compute which is a superset of transfer (in D3D, not Vulkan) - JG: I feel like we either need to rule out the existence of compute-only Qs, or support them. - CB: I’m fine with ruling them out. - CW: We could allow some flags when you create a Q in WebGPU. The implementation works out which Q to put it on, which could be the single Q for devices that only have a universal Q. So rather than query, we give hints up front. - JG: I’m fine with that. It’s not functionally different from the Vulkan API. - JG: We haven’t decided where we are creating queues. - CW: I think the Vulkan restriction means we’ll have to create Qs at device creation time. - CW: Proposal: when you create a device, you ask for Qs with flags explaining what types of operations you’ll execute on them. - CB: Allocating the processing resources you plan to use. - MM: Returning multiple queues will be problematic because fences don’t work across them. - CW: Can have only one backing universal queue in Metal and virtualize on top of it. - MM: Could cause deadlocks but sounds fine for now. - CB: Need to be able to guarantee a minimum number of queue of each type you can get. - BC: With textures 2kx2k is fine but 16kx16k depends on the hardware. - MM: How about 1 universal and 0 of others. - CW: What about applications that want to take async compute, but not rework their application? - JG: I don’t think we should lie about async compute. - CB: I am fine with a minimum of 0. - CW: I am ok with that at this point in time. - DM: I think virtualization will be tricky if we emulate multiple queue types then there could be synchronization problems and deadlocks…. (missed) Agenda for next meeting - Queues - Render targets - Pipeline state if ready?
Received on Thursday, 29 June 2017 17:35:50 UTC