- From: Corentin Wallez <cwallez@google.com>
- Date: Thu, 31 Aug 2017 15:55:53 -0400
- To: public-gpu <public-gpu@w3.org>
- Message-ID: <CAGdfWNMKh5qsnA5uuWoQAkGDZ0VtidOVHv1=s2XWLy_+ziuAbw@mail.gmail.com>
GPU Web 2017-08-30 Chair: Corentin Scribe: Dean, Kai, Corentin Location: Google Hangout Minutes from last meeting <https://docs.google.com/document/d/1hIqo26_hFy-ZMSoVEAib3EJ5Azir4zGSJBUwG6asYfI/> TL:DR - Apple is prototyping a shading language in the WebKit repo to inform discussions - Clarifications on SPIR-V logical addressing mode and pointer variable extension. - Clarifications on Metal memory barriers and fences: - MTLFence are essentially at encoder granularity - No implicit barriers in render encoders. Only between encoders - Implicit barriers happen in compute encoders - Apple showed how implicit memory barriers could work by splitting each “encoder” in its own command buffer, and interleaving them with command buffers containing barriers. - Non-trivial interaction with multi-queue - Discussion of how developers are happy with Metal without explicit control - Need to figure out which level of abstraction to expose to the Web while still retaining performance - Higher-level API vs. low-level API with helper libraries for implicit barriers - WASM designed without CPU barriers leaving design open to add them later Tentative agenda - Administrative stuff (if any) - Individual design and prototype status - Memory barriers - Consensus on consensus - Agenda for next meeting Attendance - Dean Jackson (Apple) - JF Bastien (Apple) - Myles C. Maxfield (Apple) - Theresa O'Connor (Apple) - Warren Moore (Apple) - Corentin Wallez (Google) - John Kessenich (Google) - Kai Ninomiya (Google) - Chas Boyd (Microsoft) - Rafael Cintron (Microsoft) - Dzmitry Malyshau (Mozilla) - Jeff Gilbert (Mozilla) - Alex Kluge (Vizit Solutions) - Doug Twilleager (ZSpace) - Elviss Strazdiņš Administrative items - CW: Please register for the F2F - DJ: For TPAC Monday-Friday is a meeting, Wed is for 1h sessions to give demos of stuff. Could give a WebGPU demo. Will talk with the chair of WASM to be sure we can meet at some point Thurs-Friday - JF: Please sync with Brad Nelson - CW: Still no news from lawyers. - DJ: Will talk to them. Individual design and prototype status - Apple: - MM: Started implementing a POC of a shading language in WebKit repo. It is human readable, statically typed, generics, designed from the ground up to the same security guarantees as Java or ML, C-like. - CW: Do you think you’ll have anything to show next week? - MM: We don’t have much now. But we’ll see what we have. - Google: - CW: We have done some refactoring in NXT, but that’s all. - Microsoft: - No updates, may be doing some NXT D3D12 backend contributions - Mozilla: - Keep moving to make a prototype in Servo, no result yet, takes time because explicit API. Logical addressing mode in SPIR-V - MM: Did someone suggest using SPIR-V in Logical Addressing Mode? - CW: That was my suggestion to get discussion started. My original thinking was that SPIR-V without any extension might be enough, but we haven’t done any detailed investigation. - MM: Actually, I was talking about what *your* implementation will use. Would SPIR-V with LAM be enough in your implementation? - CW: Yes, for Vulkan then that is what we need to use. - MM: If we required extensions to SPIR-V that don’t have (m)any implementations IRL, would this be a failure? - CW: Yes, because there are devices that won’t have the extensions, and we want to run on as many as possible (say 80% of Vulkan devices). Check out vulkan.gpuinfo.org to see what the stats of Vulkan devices are. - JK: Vulkan specifies a lowest common denominator, there was also the idea of a Vulkan “profile” that specifies minimum requirements. - DJ: Currently cannot compile OpenCL to Vulkan without SPIR-V extensions. - CW: OpenCL 1.2 can be mostly translated to SPIR-V + variable pointer extension, not sure how many drivers will expose it. - JK: Seems like most HW could support the extension, most Android devices will eventually support it, no timeline though. - JK: Still logical addressing as in pointers are abstract types that you can’t cast to bits. - JK: OpenCL accepts spirv without logical addressing mode. - MM: If we want more expressive shading language, variable pointer is our best bet? - JK: That extension is an incremental improvement, a relaxation of some of the rules. Quite a bit more can be done even with abstract pointer types. - JG: Still need to target existing hardware. Afraid we are getting ahead of ourselves, would like to see something that cannot be done in logical addressing. Memory barriers - CW: Last conversation was about explicit v implicit memory barriers, that Metal doesn’t have explicit, and that a developer may get better performance by being explicit. - MM: New information about Metal: - Metal fences in a render encoder (signal wait synchronization), the signaling can be pushed to the end, and the wait to the beginning. These facilities cannot be used to provide synchronization in the render encoder. - CW: I talked with someone from Metal at SIGGRAPH. Because there is a tiling GPU, there is per-pixel defined ordering. Because you can’t expect an operation within an encoder to be able to read from something in a previous operation. - MM: So any synchronization that WebGPU does (explicitly or implicitly) must occur between successive encoders. - CW: It’s not very restrictive to ask for this in Vulkan too, because you can push them to the start of the pass. In D3D12 you can put barriers wherever because there isn’t a notion of passes or subpasses. - MM: So synchronization has to occur at specified checkpoints, makes the problem of making implicit memory barriers. - CW: Two things - compute encoders might want a barrier to the same compute encoder - I’m not sure what Metal requires. - MM: Metal fences can be used with render encoders and compute encoders. Both types of encoders have the same semantics. The difference is that Metal won’t issue barriers between draw commands in a render encoder, but it will in a compute encoder. - CW: So we can’t take into account that sync has to happen at the boundary for a compute encoder? - MM: True - CW: The other point was that implicit memory barriers means you have to serialize in order to issue barriers. Problem 1: sync point for processing. Problem 2: might introduce latency for VR workflows. - MM: Fixups happens for rendering at the encoder level. So the driver is free to split encoders and leave holes, then go back and fill these holes. - CW: Don’t think other APIs let us back-patch like this. - MM: Can with command buffers, because they can be created out of order. Fill the command buffers, look at the synchronization. - CW: vulkan barriers require both “from” and “to” but “from” is not known until queue submit time - MM: been talking to teams that use metal and interact with metal developers - if a render needs to happen earlier it gets put in a separate encoder that gets sent to the queue earlier - CB: Is it the application or the driver that does this? - MM: Both do this technique: applications do it if they want to have work submitted earlier and driver does it for HW sync. - CW: in metal, each part can be encoded separately and then submitted in the right order; in others, barriers can be added when needed. DM, WDYT? - DM: Would be interesting to experiment with. Not sure we can create extra command buffers then what the application gives us. Not sure how this will allow transitions between queues of the same family that transition the same resource to different layouts. There isn’t a single place where we can track these transitions. - MM: Haven’t really decided on the story on multiple queues of the same family. - CW: Let’s not go off into talking about queues yet. What other issues are there with barriers? Currently we need to defer creating barriers so that we can compute the “from” part of the barrier. - MM: [KN: I missed this. Asking about restrictions on barriers in other apis?] - CW: Can put barriers wherever you want but might cause UB. - MM: If something works for render passes, it should also work for compute? - CB: Sounds sane, there is no distinction on D3D12. Do you allocate thread or do you allocate pixels and hope differences will show up. - CB: Will it be acceptable for web developers to work out these issues of synchronization? The first thing most game developer do is a layer to track synchronization so they don’t have to care about it anymore. - JG: Would like it to be a JS library. - CB: Could we converge to simplify work for this layer by having a common thing? - MM: Just showed a model where the programmer doesn’t have to specify anything. - JG: But at what cost? - MM: Metal is successful so acceptable cost. - CB: Most game developers work at the abstraction level of Metal. - JG: Don’t expect average Webdev to learn WebGPU same as for WebGL developers. - DJ: Metal made the choice to shield developers. Still works for AAA games. Don’t know why Vulkan chose this way. - CB: There is great performance to be had, but only a few developers can take advantage of it. They are the same people that will target each native API directly. Webdevs would tolerate much higher level of abstraction. - JG: Did Vulkan / D3D12 make a mistake there? - JFB: for WebAssembly we decided to not have barriers like C++. You can lose some performance, but it is only for experts. WASM has zero barriers but it is good enough to have performance, and we can add barriers later in the design we chose, if experts prove that they need to barriers for performance. [Not to say this applies to GPU, but the point is that] I think we can have no barriers first then see if it works. Then experts can rant about lack of barriers. Leave the design open so we can add barriers later, if needed. Metal has shown that you don’t really need them, at least as a first step. Maybe Vulkan is right and you need it, but you can add them later. - JG: One of the problem here is that people who target Metal is: only one platform supports multiple low level APIs (Windows with Vulkan/D3D12). There’s no direct comparison between Metal and an API with explicit barriers. Since there’s no choice of API on Mac we don’t know that Metal’s choice was “better”. - CW: One thing also is that native have either barriers and multiple queues or none-of them. Is it even possible to have multiple queues without barriers. - MM: Metal has different queues, just not sync between different queues - BC: Barriers don’t really have anything to do with GPU synchronization (fences, etc). Barriers usually affect texture layout, cache locality, etc. Nothing to do with multiple queues. AFAIK Vulkan uses similar nomenclature to distinguish these concepts. - DM: Connection between queues and barriers is from automatic of resource transition because multiple queues can be independently executing and operating on the same resource. Tracking gets more complicated. - MM: So let’s not have multiple queues! (not joke) - MM: If there are multiple queues, then Metal has to virtualize the queues onto the same hardware, which leads to deadlocks - CW: Something about Vulkan spec wanting forward progress. - DM: I want to experiment with a standalone implementation of what Apple has described. Basically translating from Metal style to Vulkan style - can directly compare performance with manual optimization. - JG: Should expose raw platforms instead of thinking “this isn’t needed in most cases” “that ...”. - DM: Can the simplification be done in user space? - CB: Less chance for consistent behavior between browsers. - MM: Argument isn’t that automatic barriers can always be done as well as hand-made barriers. - JG: Preference design wise is to have hand barriers then make a userspace layer for automatic barriers. Consensus on consensusAgenda for next meeting - CW: After shading languages, what should we do? - JG: Don’t think we’re going to finish shading languages - CW: Still don’t want to talk about all shading languages all the time to respect Chas’s David’s John’s etc’s time. Let’s alternate with other stuff. - JG: Maybe every other week? - CW: OK. So, more memory barriers? Consensus on consensus? Something else? - MM: How about not memory barriers. Agenda for next week: - More shading languages. Agenda for in two weeks: - Consensus on consensus - Open questions on the pipeline objects.
Received on Thursday, 31 August 2017 19:56:38 UTC