- From: Corentin Wallez <cwallez@google.com>
- Date: Fri, 21 Jul 2017 16:45:55 -0400
- To: public-gpu@w3.org
- Message-ID: <CAGdfWNPtZDJ2wYku_x65dHPdvZdhK5+t+LeFK_jX5RCx2c2Fwg@mail.gmail.com>
GPU Web 2017-07-19 Chair: Corentin & Dean Scribe: Dean with help from Ken Location: Google Hangout Minutes from last meeting <https://docs.google.com/document/d/1FupUhxJL7TfzFSgofShxqYvAlW_zHlZphKB8k7Prwzg/edit>Tentative agenda - Administrative stuff (if any) - Individual design and prototype status - Renderpasses / rendertargets - Pipeline state details - Agenda for next meeting Attendance - Dean Jackson (Apple) - Myles C. Maxfield (Apple) - Theresa O'Connor (Apple) - Warren Moore (Apple) - Austin Eng (Google) - Corentin Wallez (Google) - Kai Ninomiya (Google) - Ken Russell (Google) - Ricardo Cabello (Google) - Rafael Cintron (Microsoft) - Dzmitry Malyshau (Mozilla) - Jeff Gilbert (Mozilla) - Alex Kluge (Vizit Solutions) - Kirill Dmitrenko (Yandex) - Doug Twilleager (ZSpace) - Elviss Strazdiņš - Joshua Groves - Tyler Larson Administrative items - CW: I sent email about the Sept Chicago F2F meeting. Please reply to me if you’re coming, either in person or by hangouts. - CW: I’ve also put up an agenda document, that we will fill in before the meeting. - DJ: TPAC, we will meet with WebASM. I’ll coordinate a time and let internal-gpu know. Individual design and prototype status - CW: Google have spent time on texture to buffer copies and the D3D12 constraints there. The buffer row pitch we think we have to add to WebGPU but in NXT we found a way to not have to add the bufer offset alignment by sometimes splitting copies in two. - DM: Mozilla has looked at Metal and D3D12 backend. And got Vulkan descriptor pools mapped to Metal Indirect Argument Buffers. It seems to work well, but isn’t strongly tested. - DJ: IAB will only work in High Sierra and above so we might need to fallback to previous binding model before that. - DM: In our prototype it is compile time flag that allows choosing between IAB and “old Metal”. - MM: One difference is that IABs can be written by the GPU on a subset of the hardware. If you are talking about GPU filling then that’s another approach. Vulkan does not have this, but D3D12 does. - CW: I think this is very advanced and we shouldn’t look at it for now in WebGPU. - MM: If we’re talking about CPU binding, then using IABs isn’t always necessary. - CW: We’ve been able to implement the binding model we presented in NXT using Metal’s older binding model. - MM: Yeah either way is fine. Pipeline state details - DM and JG have worked on the issue in GitHub. - https://github.com/gpuweb/gpuweb/issues/26 - CM: This issue highlights the difference between all three APIs. - DM: It seems pretty clear what the overlap is. Vulkan will need its dynamic state capabilities. - DM: Also need to remove some features the D3D12 doesn’t support like TRIANGLE FAN, separate face stencil and mask,... - DM: Instance rate and Sample mask are the difficult ones. Vulkan doesn’t support an instance rate more than 1. Sample mask is not present in Metal. - DM: For the MVP we could support an instance rate of one. - CW: What is sample mask? - DM: When the sample coverage is computed by the rasterizer, it then uses the mask to limit the samples you render. - MM: Metal doesn’t have that concept. We shouldn’t support it. - DM: Could have the device capabilities expose “bool isSampleMaskSupported” - DM: Didn’t look at tesselation and .. - MM: Don’t think tesselation is necessary for MVP. - CW: Especially since it different between (D3D12, Vulkan), and Metal - DJ: Did you suggest we remove sampleMask or make the device advertise whether it supports it? - DM: wants to find some samples using it he can share - DJ: do they inject them into the Metal shader? - Would be great if we had someone from Unity here… - MM: why should this be an API construct and not something the shader authors put in? - CW: Probably supported by fixed function on some hardware - DM: Reduce the amount of data written back to VRAM. - CW: let’s tag SampleMask as something we need more data on and get back to it later - RC: My understanding is that it only applies to MSAA workflows. I don’t think we can work around it in the pixel shader. - MM: Depends if the pixel shader has access to the right builtins. Know GLSL has it. - DM: There’s a scenario where you want the shader to run on sample frequency - the pxiel shader can be run per-pixel, or per-sample. Setting the mask would allow the hardware to skip some fragment invocations. - MM: Which backends support pixel shader per sample? - DM: All of them, will double check. Vulkan supports very configurable shading. D3D12 does per-sample shading if the shader uses one of the relevant builtins. - CW: feel we’re ratholing a bit. Either get more data and info on how people deal with lack of sample mask on Metal, or just exclude it from the MVP and add it later. Suggest postponing it. - DJ: Can ask the Metal team what the reason for leaving it is. - DM: I think other than this, we have a pretty good picture of the states for the MVP. - CW: in Vulkan, the primitive type has to be set on the pipeline state whereas in other API it is just triangles vs. line vs. point. - DM: Yes, I don’t think we have an alternate choice. Render targets / Render passes - CW: Have people had a chance to look at the documentation on Vulkan Render Passes? - RC: I have looked at the github issue. - https://github.com/gpuweb/gpuweb/issues/23 - MM: did read the relevant chapter in Graham Sellers’ Vulkan book - CW: think we need something at least like Vulkan’s renderpasses - Two additional things in Vulkan: - More explicit dependencies between rendering operations - Input attachment: say you’re going to sample a texture at the same location as the pixel location you’re rendering - Allows keeping data in tile memory; this is hugely important for mobile - CW: I am a fan of the concept of renderpasses. I’m not sold on everything that Vulkan does, but there might be good reasons for their design. - RC: As long as we can emulate them on APIs that don’t have them, I’m ok with it. - CW: emulation would be to use it as a texture (it’s free) - Instead of making something an input attachment, you’d make it a target - The input attachment operation in the fragment shader would be a texel fetch - RC: So you’d do the pass/for-loop yourself in the implementation? - CW: input attachments: you have one rendering pass with an output attachment - Then transition to input attachment (G-Buffer, lighting, …) - In D3D: that’s a sampled texture (SRV), use TexelFetch or similar - It’s a different function call in SPIR-V that is an offset from the current texture position (and current hardware only supports a (0, 0) offset) - DM: Downside is that it complicates the API for users and for specification writers. But it is difficult to put it post MVP because it affects things like the definition of pipelines. - OpenGL has a tiled memory extension too. EXT_shader_pixel_local_storage <https://www.khronos.org/registry/OpenGL/extensions/EXT/EXT_shader_pixel_local_storage.txt> - KD: If we can design it as an opt-in feature, that can first not be used, then use render passes to optimize things, it would be great. - CW: If you only have one render sub-pass, it’s equivalent to only having one pass or Metal’s approach. It’s just a bit more verbose. - KD: Basically it would possible to start with one renderpass with one subpass and then split things? - CW: usually would start with a monolithic one and split it up into smaller ones with dependencies - Seems we need more thought - DM: Have you emulated this in NXT yet? - CW: yes. Transform it into a bunch of Metal-style render encoders. - KN: Don’t have input attachments yet. Should be easy. - CW: Have RenderPass objects, but they’re only a placeholder for later. - MM: design sounds fine to us - CW: Great. Dependency Tracking and Undefined Behaviour - MM: The reason I raised this in the last meeting is that they seem to express dependencies and if you get them wrong, then your rendering might be broken. - MM: One of the reasons the Web is good is because it works the same everywhere (ideally). If you get synchronization wrong on some backends and devices, it might work in some places and not others. So if you ship something with UB, then your customer can say “wait this looks wrong”. - MM: We need to make it very difficult to create a WebGPU program that has undefined behaviour. - CW: generally agree. D3D, if you use the debug layers, forces you to do the right barriers. - Either an “or” of read state, or 1 bit of write state. - Implicitly does the right memory barriers behind the scene. - Think that using this kind of resource tracking, can get rid of most undefined behavior due to memory barriers. - This is what we’ve been doing in NXT. - Graphics/compute interop sample had usage transitions, and it “just worked” on D3D. Memory tracking is just 50 lines of code in the D3D backend. - Big fan of usage transitions like this. - (D3D doesn’t have a spec, so D3D’s debug mode shows the correct usage.) - DJ: We could have a WebGPU debug mode that tells the content that it’s done something wrong. - CW: we’re saying that we have this sort of (D3D debug mode)-like tracking already in NXT on all the time, and it’s working fine. - MM: what you’re saying is basically what we were about to propose - Best way to eliminate this undefined behavior is to have this sort of state tracking in the browser - DJ: Whether it does it via a native state tracking or not is fine. - CW: agree. NXT is based around the assumption that doing this state tracking is fast, and not too constraining for the application. NXT design doc has a section on this (TODO(cwallez): add link). - MM: One other point is: if we are going to do the state tracking, if the user tries to use a resource that isn’t in the correct state, the browser transitions it. - CW: either we do the transitions implicitly, when user uses resource in a different way, or do it explicitly. NXT asks user to do it explicitly. “Command buffer, transition this buffer to this usage”. - MM: if you’re going to do all the tracking, why not just issue the correct barriers? - JG: one of the common complaints: sometimes the user doesn’t know what state tracking they did wrong. Also, forcing the user to say “this is th estate tracking i’m doing”. - So we say “we only do the transitions you ask us to do”, so you know where memory barriers are done. - MM: it’s easier for authors if we do the right thing - JG: harder for authors to get performant code - CW: explicit usage transitions allow you to bulk them together. Results in only one D3D memory barrier operation, so only one GPU “WaitForIdle” instead of ten of them. - DJ: We should get feedback from the Metal driver the impact of having implicit barriers - DM: Not optimistic about automatic tracking, if you have multiple queues in which you submit to different queues and there’s synchronization with semaphore. Doing automatic tracking on the CPU side becomes hard. - DM: Metal has less synchronization features so it makes the CPU tracking easier in Metal. - JG: Amongst complaints about OpenGL is that it is hard to know the memory barriers that happen because they are implicit. They being explicit are an advantage of the explicit APIs. - MM: this is the point that Dean just made: two of the APIs expose these transitions, but Metal’s successful (JG: in its own goals), without doing so - CW: Metal has to run on fewer platforms, so either you run on dGPU or mobile GPU designed by Apple. Point JG’s trying to make: it’s not because Metal was able to get this working for a limited number of GPUs that we should be able to do this on D3D or Vulkan. - DJ: you’re confident that D3D and Vulkan backends will be able to run performantly even though they’re doing this state tracking themselves? - CW: suggesting that all the usage tracking is done in NXT. Not relying on the debug mode of the API. CPU cost of doing implicit tracking == CPU cost of validating things are done in the right order. Might as well be explicit because it has a performance advantage (not on Metal, though). - DJ: think it will simplify the API by doing it for the author. - MM: Metal runs on Macs and Macs use off-the-shelf GPUs, and Metal also runs on phones. GPUs are close enough to GPUs of other APIs that Metal would have a big advantage on memory barriers. - MM: theoretically possible to get an old tower Mac Pro and dual-boot it into macOS and Windows - CW: can MM ask the Metal team how they do the barriers? Do they issue it at the last moment when they see the resource is being used in a different way? Or do they parse the command buffer and try to coalesce the memory barriers? - DJ/MM: think it’s the latter, we’ll ask them. But even if they tell us the answer we might not be able to repeat it. And it might be just an implementation detail (different on different drivers.) - KD: No matter which implicit synchronization the API will do, we’ll need to specify it clearly in the spec so application can predict where memory barriers will be inserted. - CW: if we have implicit memory barriers, disagree that it should be specified where they’re inserted because it’s an implementation detail. Might depend on the backend which way’s the most efficient. - DJ: More important that things work consistently vs. having the spec say where barriers happen. On the Web reproducibility is more important than perf. - KD: if you have interactive content performance is part of the result - DJ: agree, but interoperability more important than performance - JG: true in a long term sense but in the short term interop is only guaranteed by using something like WebGL - DJ: was lucky with WebGL because behavior could be defined well, and because of large amount of work in interoperability tests - KD: my experience is that WebGL sometimes doesn’t work in some browser or another - JG: think WebGL will behave more consistently than WebGPU for a few years - TO: how is that relevant? - JG: depends on what we’re going to do with this API Agenda for next meeting - Memory barriers - More on render passes. - Dean to chair next meeting - In three weeks talk about shaders (second week of August, the 9th).
Received on Friday, 21 July 2017 20:46:40 UTC