- From: Corentin Wallez <cwallez@google.com>
- Date: Fri, 15 Sep 2017 15:25:13 -0400
- To: public-gpu <public-gpu@w3.org>
- Message-ID: <CAGdfWNMk43LTFF4J1PjfM-ZvvBPL6_p7TFCVrqSM8pnkXWybwg@mail.gmail.com>
GPU Web 2017-09-13 Chair: Corentin Scribe: Ken Location: Google Hangout Minutes from last meeting <https://docs.google.com/document/d/1T_bbzKC22BAq3ax_cnm_kcL-KJSG4nM5sH4C9egIUfk/> TL;DR - Open questions that do not affect the structure of the API can be deferred post-MVP - Pipeline Objects - Rasterizer sample count and depth bounds deferred post-MVP - Pipeline caching should be deferred post-MVP. This will allow us to understand better what are the usage patterns and make correct decision to design the API. - Primitive restart: Ben confirmed that 0xFFFFFFFF should not enable the sentinel value on int16 indices on D3D12. Consensus to put index format in pipeline state unless better solution. - Explicit boolean for depth testing: concern that implicit might interact badly with future extension. Discussion deferred. - Consensus on consensus - Consensus that compute should be an MVP feature. - Supporting multiple queues in MVP is an open question (simpler but concern it has structural implications) - State inheritance between renderpasses / subpasses was marked as having consensus but things aren’t 100% clear. - Lot of consensus confirmed. Tentative agenda - Administrative stuff (if any) - Individual design and prototype status - Consensus on consensus - Open questions on pipeline objects - Agenda for next meeting Attendance - Apple - Dean Jackson - Myles C. Maxfield - Theresa O'Connor - Warren Moore - Google - Corentin Wallez - John Kessenich - Kai Ninomiya - Ken Russell - Zhenyao Mo - Microsoft - Ben Constable - Chas Boyd - Frank Olivier - Rafael Cintron - Mozilla - Dzmitry Malyshau - Jeff Gilbert - ZSpace - Doug Twilleager - Joshua Groves Administrative items - CW: At the end of the meeting let’s talk about what - DJ: do we have a doc for agenda items? - CW: yes, will link it. LINK - DJ has been pinging lawyers about license discussion; no response yet; last email was that they were talking with Google and Microsoft - CW hasn’t asked in a while - RC also no news Individual design and prototype status - Apple - MM: nothing interesting. Continuing development on shading language prototype - BC: any highlights you’d like to share? - MM: think yes, but want to save it for a shading language discussion - Google - CW: Not much to report - Microsoft - BC: haven’t worked on much code. Working through discussions and design issues. Making good progress we think. - Mozilla: - DM: making progress on OpenGL backend for their graphics abstraction - Prototyping inside Servo: can render triangles with graphics pipelines with decent framerate given that we’re reading back the buffer - Servo prototype: https://github.com/kvark/webgpu-servo Pipeline objects - CW: had an email thread: there were some comments that added questions on the pull request - CW: had consensus on: - CW: Not exposing the depth bounds feature of D3D12 and Vulkan because it doesn’t exist on Metal (?) - DM: did we really have consensus on this? Would like this to be exposed under a feature gate post-MVP - MM: agree to include it post-MVP - CW: would make sense for it to be exposed as an extension. Can we agree to not put this in MVP? - DM: OK. - CW: can not have different rasterizer sample count from the render target’s sample count - DM: why not require specifying this during pipeline creation? - CW: hard to do this from render pass - DM: render pass only says your images have this number of samples. Doesn’t say how the rasterizer will work on them. Ultimately want rasterizer to support different frequency than number of samples. If we add this capability now then we don’t need to change the API later - CW: what’s the point of having a rasterizer sample count different from the texture’s sample count? - DM: imagine you’re rendering into non-MSAA textures. Rendering with sample count=16. Your shader can see the sample mask and … - BC: recurrent theme here: should be a stated goal that the MVP API is allowed to change later – necessary scaffolding, but don’t want to restrict building features later (2 have come up in the last 5 minutes). Also don’t want to forbid changing the API. Have to build and test the thing before you can design the API surface. Want us to not fear changing the MVP, but rather get to MVP quickly. Unless, there’s a very definitive proof that it’ll change the structure. Some features are like that and require changing many API points. - DM: coming from a standpoint that 2 of the 3 APIs require specifying that, but agree with deferring it post MVP. - MM: didn’t understand what you’re (BC) saying. Should we expect to make breaking API changes post-MVP? - BC: mental model of MVP: MVP is not version 1.0, but version 0.8 or 0.9. It’s something we build to figure out what we want in 1.0. It’s a beachhead, not winning the war. Want MVP to not be something we have to support forever. Expect that we might need to change the pipeline state object in some way. Think users should expect to have to change their code if they code to the MVP. - DJ: agree with Ben. - CW: for features where it’s not a structural issue, but one member in a structure or one function later, it’s easy to add it later without problems. - CW: do people agree we should defer depth bounds and rasterizer sample masks to post-MVP? - DM, BC: yes. - CW: Vulkan and D3D12 have different ways to cache things - Have either one sort of pipeline derivation – “this pipeline looks like this other one” - Or, let the browser do everything - JG: you mean, browser caches things implicitly? - RC: preference would be the latter. Maybe MVP is the browser redoes everything from scratch. Later, cache smartly for you. If infeasible, give developer the knobs they need to do it themselves. - DJ + CW: agree - BC: building up a feature for AAA games for this, it took a few iterations to get it right. Think we’ll need the MVP and a few workloads running to get this right. Think we should defer it. This is one of the reasons to get the MVP running faster. - JG: agree. This is a solvable thing but kicking it down the road is fine, and preferred. State that users shouldn’t expect that shaders will be cached by the MVP. - MM: Question for Microsoft: MSDN has some information about the cached PSO. Does that work like derivative pipelines, or does it require an exactly equal pipeline. - BC: that’s one of the many corners of the API that can’t answer directly. Can get an answer quickly. - CW: pretty sure it requires the same pipeline. Saw it somewhere in the documentation. “The rest of the data in the PSO still needs to be valid and match the cached PSO or an error is returned <https://msdn.microsoft.com/en-us/library/windows/desktop/dn914407(v=vs.85).aspx> .” - CW: sounds like there is agreement that caching will be prototyped and done after MVP. - CW: primitive restart - Metal always enables primitive restart. No way to disable it. Otherwise you need to parse the index buffer and validate against the primitive restart index. Seems we have to enable primitive restart on all APIs. Ben on the mailing list explained that drivers should be looking at this for the index buffer format. - Encode the index buffer format in the pipeline state? How do people feel about this? - DM: would like to defer decision until we get more information from Ben. DM provided a test case to Ben, using MSFT-issued driver WHQL certified, and behavior is that 32-bit cut index works for 16-bit buffer. If this works then we don’t need to provide the index buffer type. - MM: what about 32bit inde buffer with a sentinel value of 0xFFFF? - JG: it’s just a value. In WebGL 2.0 we force enabled primitive restart because it can’t be disabled in D3D11. - BC: will investigate Dzmitry’s test case. Spec is that it doesn’t work to do this. At the time you provide this you’re also providing lots of details about your index buffer. Will follow up on why it’s not behaving as specified. - KR: Myles mentioned something about the stencil buffer / stencil mask. In WebGL we are making a rule that the sample mask only the stencil bits used in the FBO are what’s used. This is a behavior change that’s important for portability. The primitive index may also be being masked to the number of bits in the indices. - BC: Fairly accurate data, the test checks that 0xFFFFFFFF should not work for int16 index buffers. The driver Dzmitry tested on should not have passed this test. - BC: Having the pipeline state having the type of the index buffer seemed like an easy solution. Why do people think it isn’t a great solution? - DM: concerned there will be clients who defer the index buffer binding and the type of the index buffer until later. Not sure everyone will know it at pipeline creation time. But if MSFT requires it in D3D12, then let’s require it. - BC: the spec does say this. Will also investigate your test case. Also have to know the vertex buffer format up front, and that’s even more complex than the index buffer format. - JG: this is not something that we, as the WebGPU driver, can infer. - BC: no. In the pipeline state, here are your shaders and root descriptor, and here are your vertices. - JG: index buffer width. - CW: that’s known when you bind the index buffer. - JG: and you can bind multiple different ones with the same pipeline state object? - CW: yes. - CW: agreement to put index buffer format in pipeline state? - JG: yes, but it’s just a new restriction. But if you ever had a situation where you mostly use U16s and then upgrade to U32s later, you have to create a new pipeline state object later. Maybe not so bad. - CW: with caching it should be free. - BC: in this case, if you’d written your engine in D3D or Vulkan, when you upgrade from 16 bit to 32 bit you’d also change this value and create a new PSO. - JG: unless you’re not using primitive restart. Or skip using that index. - BC: ...yes...could do that. But this is a small thing to cover all 3 APIs. This seems like a small mental cost to pay to get fast performance on all 3 APIs. - JG: just trying to make sure we understand even the little things we’re leaving behind - CW: wacky idea of putting vertex buffers inside descriptor sets/bind groups - Seems nobody likes that, so let’s not do that - Agreed - CW: should we explicitly enable independent blending and depth testing? - The two are different - Independent blending: could figure out that you’re not doing it. - Depth testing: if the comparison function is always true, it’s the same as not enabling depth testing. - DM: mentioned we could use nullable IDL property to handle this. But it turns out it’s not possible. Can’t have a nullable dictionary in another dictionary. Also implies we should do what Apple proposed, and derive whether the feature is enabled based on the values provided by the user. Both for independent blends and depth testing. - BC: to clarify: want to infer that it’s turned off because they used ALWAYS with no writing? - JG: yes. Then if you’re always passing you can disable the depth test. - BC: concerned the semantics are different than what we’re telling the driver - JG: agree...but can’t think of a way it’ll be wrong - BC: concern from resident spec expert that as other values get added to the depth testing, then the difference between ALWAYS and turning depth testing on/off will be more apparent, and then you’re stuck. Having a separate bool enable seems more future-proof. They’re different in the hardware. - JG: counter-proposal: if we extend the depth test options, we could add back in the separate disabling of the depth test. - BC: when implementing this in NXT, felt like the fact that this didn’t map to how the hardware was talking about it made it hard to build it and hard to test it properly. Felt I was fighting it. Not a scientific measure. - BC: however, someone doing the Metal backend would have to do the reverse. It’s sort of a 2 out of 3 vote based on the structure of the low-level APIs. - CW: seems like a more difficult topic than expected. Let’s flesh this out more on the mailing list. Consensus on consensus Link to Myles’ document. <https://github.com/gpuweb/gpuweb/wiki/Roadmap> Mailing list thread. <https://lists.w3.org/Archives/Public/public-gpu/2017Aug/0000.html> - CW: high-level object model. There should be at least one type of queue that can do everything. Everyone agrees? - DM: what if the hardware doesn’t support compute? - CW: then doesn’t support GPUWeb. Vulkan requires one queue to support everything. D3D12 supports both. - MM: skipped whether we’re going to support compute. - Think we need compute. - CW: agree. - DM: think we need async compute. - CW: we should at least have compute. - BC: agree we should at least have compute. - CW: this is no longer an open question. - CW: MVP will only allow one instance of Queue, ever, per Device (instantiation of WebGPU) to simplify the MVP. - JG: don’t love it - CW: would be asynchronous w.r.t. the CPU, but since there’s only one Queue there’s no asynchronous work. - JG: could create command buffers in parallel but not submit them in parallel. - CW: yes. - JG: don’t love it, it’s an important part of the final design. - BC: it’s one of the reasons people move to these low-level APIs. Understand JavaScript makes concurrency hard. There were situations where we moved to D3D12 specifically to have multiple queues. Think these are the problems people will want to solve. This is one of the designs that will completely change everything. Understand this makes things difficult because it reopens the barrier discussion. But think it needs to be in the MVP. - CW: agree with BC’s statement. If we want to keep this as an open question then we should talk about it as a second phase in the memory barrier discussion. Synchronization between queues is a variant of memory barriers. - MM: agree it’s an open question. - CW: Queues need to be created at device creation time. - JG: Vulkan requires this. Or need to create things in the background. - CW: agreed. - CW / JG: Render passes. - Metal’s render encoders and Vulkan’s renderpasses will be encoded by the same concept – “RenderPass”. - JG: seems correct. Similarity between Metal’s render encoders and Vulkan’s subpasses. - DM: Let’s not associate subpasses with Metal encoders. May encode multiple subpasses into a single encoder. - CW: esp. if we use framebuffer fetch for optimization. - MM: Reason why they were marked as the same object, is that this is where you attach textures for rendering. - JG: they’re not exactly the same but are similar. - CW: not really consensus but more shared knowledge that these API objects are the same. - CW: work should be done in a render pass between Begin/End, can not do compute at the same time. Q: should we say that you can’t do compute at the same time? - Because of Metal and Vulkan, can not mix graphics and compute work. - Technically, Metal 2 on iPhone X can do it. - MM: we want to target more than a single phone. - CW: so should we explicitly Begin/End compute passes? Believe we should. Shows the developer that things are separate. - JG: no strong disagreement. Sort of makes sense but hesitate to commit to it yet - DM: makes less sense than graphics passes because compute passes do not share as much as subpasses in graphics. That’s why the case for Begin/End on the compute encoder is weaker than that for the graphics encoder. But, fine with Begin/End on compute. - CW: tentative consensus - CW: Open question: begin / end blit? - CW: consecutive render passes don’t inherit state. Same between render and compute passes. Render buffer state between passes is separate. - Is there state inherited between subpasses? - DM: sure. You can bind descriptor sets between subpasses in Vulkan. - CW: in Metal you can not inherit state between render encoders. - DM: but Metal’s the only one which doesn’t inherit state. And Metal’s the only one that doesn’t have subpasses. - MM: wish people would go back and read the notes, because it’s clear we had consensus on this issue. - CW: AI: add a link to the notes and talk about this more. - CW: you can only change attachments at a renderpass boundary. Think this is obvious because of Vulkan’s and Metal’s constraints. - DM: you mean binding a framebuffer. Yes. - CW: open question about whether renderpasses have synchronization. Should wait until the memory barrier discussion. - CW: consensus about using Vulkan’s 3-layered hierarchy of sets of descriptors, and a small number of descriptor sets to bind things. So you bind a descriptor to a descriptor set, and support lots of descriptors, but only small number of descriptor sets. - Can keep open questions about D3D12’s descriptor heaps, more q’d about Vulkan’s descriptors. - RC: so do we have consensus? - CW: we don’t have consensus on descriptor set allocation optimizations - RC: do we have consensus that there’s a straightforward and performant mapping? - CW: will be performant and easy-to-map if it looks like Vulkan. The only questions are around allocation of descriptor sets and pooling of descriptors. - CW: lots of consensus about pipeline states. Document in the repo covers this. - DM: renderpass information is not included as consensus. - MM: will update the document. It was old. - CW: Got through most of it. Can leave the rest for Chicago. Agenda for next meeting - Chicago F2F Agenda - Discuss shading languages afternoon Friday the 22. - But following meeting will also be about shading languages. - CW: for Chicago: please add topics to the agenda document.
Received on Friday, 15 September 2017 19:35:39 UTC