- From: Corentin Wallez <cwallez@google.com>
- Date: Tue, 26 Sep 2017 09:54:30 -0400
- To: public-gpu <public-gpu@w3.org>
- Message-ID: <CAGdfWNOxJ6NEg5Amc1GFwjTwr_zDX6bR-fy-LQPbHXrNX=4=-g@mail.gmail.com>
GPU Web 2017-09-22 Chicago F2F Minutes from last meeting <https://docs.google.com/document/d/1seCUVBkzkRPEj0sfcDBymGjwSPndTGPhsQJPkUQscNY> TL;DR of the TL;DR - Better fleshed out target for the MVP, some decisions are still pending investigation - More discussions of implicit vs. simplified vs. explicit memory barriers. Action items to make investigations on example use-cases. - Metal is able to automatically run some encoders asynchronously. Discussion on doing this vs. having application explicitly handle queues. - Fil + Myles showed their prototype language WSL that encodes the constraints of SPIRV logical addressing mode. - Widespread agreement that it is an improvement over current languages. - Discussion on which language the API consumes (SPIRV vs. WSL) - Agreement that we should have a blessed language, but which one? - WebGPU devices will be created from thin air and there is something like “WebGPUSwapchainCanvasRenderingContext”. Also how WebVR would work inside the frame callback. - DOM elements uploads are from image bitmaps or video source, that’s it. TL;DR - Mozilla has a WebGPU prototype <https://github.com/kvark/webgpu-servo> running on D3D12 and Vulkan - MVP Features discussion - MVP should contain structural elements of the API and prototype ergonomics. - MVP doesn’t have to run on all target hardware of v1 - Open question: shading language for the MVP? WASM + JS API or just JS? - Out: tessellation, transform feedback, predicated rendering, sparse resources, pipeline caching, aliasing - Tentatively out: DrawIndirect / DispatchIndirect (need investigation), “glGenerateMipmaps”, queries - Tentatively in: multisampling - In: compute, fragment, vertex, render passes, MRT, upload download, copies, instancing, binding model, command buffers, pipelines, a dummy extension - If we decide on multiple queues: async compute, queue synchronization - If we decide on explicit memory barriers, they are in - Memory barriers - Initial opinions - Apple: simplified API with implementation doing more work. Vulkan version would do more work but a design can make speed good enough - Google: declarative and failsafe API with explicit transitions, should map nicely to all APIs, ensure apps don’t fail to render on Android - Microsoft: D3D team’s experience is that barriers are required to make bindless work because driver doesn’t know which resources will be accessed. - Discussion that Vulkan and Metal aren’t really bindless. - Mozilla: would like to give developer full control and power of underlying APIs - Metal has “explicit barriers” at encoder boundary. Can run some encoders async. - Discussion on whether to allow for barriers inside a subpass and what the usecase for it is in Vulkan. - Discussions about Vulkan render passes, how it requires memory dependencies between subpasses to be specified or it would cause pipeline recompiles. - Agreement that we can’t validate shaders with data races (via UAVs). - Industry building task graph abstractions. Metal provides it as a linear sequence optimized by the driver. Vulkan provides subgraphs at a time with render passes. - Frostbite’s FrameGraph: https://www.ea.com/frostbite/news/framegraph-extensible-rendering-architecture-in-frostbite - Need to gather use cases requiring memory barriers and see how they’d be implemented in each API. - Multiple queues - Metal can push different encoders to different hardware queues automatically. That the app can create multiple MTLQueues is just because it didn’t make sense in ObjC to limit creation of only one MTLQueue. - Discussion about automatic async compute on D3D12 / Vulkan. - D3D exposed multiple queues to stop analyzing the command stream in the driver. Maybe not needed for WebGPU. - Explicit queues make app point out parallelizable commands. - Order of submits in a Vulkan app is a good order for encoders in Metal. Will need validation of the correctness of the order in all cases. - Shading languages - Fil presented his and Myles’s work on making a language with constraints of SPIRV logical addressing mode built in. Called WSL here. - Familiar C syntax, generics, operator overloading used to implement vector types for example. - Shader terminates early in case of error. - Special pointer types encoding SPIRV constraints - T^ cannot be cast, cannot be assigned after declaration (content can though), and function returning them can have only one return. - Some slice types for arrays, went into less details for them than for T^ - Goal is to have bisimulation with SPIRV to show they are equivalent. - Follow-up discussion: - Agreement that language improves greatly on GLSL / HLSL - Concern about creating a new language, and asking people to move over - Concern about generic instantiation bloat when people want to keep one code path. - Apple suggest API consumes WSL directly. - Concern WSL doesn’t reduce the number of checks or speed of validation. - Discussion of advantages of WSL over SPIRV and rebuttals: - View source vs. shipping shaderc WASMed only where needed - Security built in vs. logical addressing mode is just that - Flexibility for WebGPU vs. SPIRV execution environment - All APIs require same amount of translation vs. ??? - Request for name change: WSL is Windows Subsystem for Linux - Myles did a demo of WSL <https://cdn.rawgit.com/webkit/webkit/master/Tools/WebGPUShadingLanguageRI/index.html> - AIs on showing equivalence of SPIRV and WSL, and showing validation of SPIRV logical and buffer accesses instrumentation. - Need to talk to Khronos to see what implications of using SPIRV are. - Concern that WSL was quick to dismiss prior art and battle tested toolchains. - Roundtable - Apple: Think WSL meets requirements from the group. Think SPIRV could be ok provided there are more investigations. - Google: WSL is a great investigation, need better defined requirements for shading languages, SPIRV gets us far quickly and our intuition is that it is the right choice. WebGL experience is that native parsers are huge source of bugs. - Microsoft: View source is important, suggest HLSL is the language of choice as there has been a focus on standardizing it. It has the largest amount of content. Think at a low-level it would be better to accept SPIRV than DXIL. - Mozilla: We should push the platform, making a new language would slow us down. - Developer PoV: a textual representation is important for education etc. so blessed high level language is important. - Suggestion to use HLSL as de-facto high-level language and SPIRV as intermediate level. People would want a better spec for HLSL though. - DOM interactions - Agreement that a WebGPU device (root object) is created from outside of a canvas. - Consensus there is a WebGPU device constructor with no arguments - Agreement that there is a canvas rendering context that gives you a “WebGPU swapchain” that hands out texture for rendering. - Consensus that WebVR is a supported use case. Will need a way to update buffers synchronously without blocking inside the WebVR frame callback. - In the WebVR frame callback the application will ask the WebVR swapchain for the next textures. - WebGPU might require rendering in a texture array and not side-by-side like is currently allowed in WebGL. - Only one entry-point to upload a 2D DOM element; it takes an ImageBitmap. - Another entry-point to create a texture “video source” from a video element. Tentative Agenda - Morning (9AM - 1PM): - Status updates - MVP features - Memory barriers - Multiple queues - 1PM - 2PM: Lunch - Afternoon (2PM - 6:30PM): - Shading languages - DOM interactions - Others (for extra time)? - Swapchain/presentation - Re: descriptor heaps - Re: index format in pipeline state? Attendance - Apple - Dean Jackson - Filip Pizlo (by phone for shading language) - Myles Maxfield - Google - Brandon Jones - Corentin Wallez - Kai Ninomiya - Ken Russell - Shannon Woods - Zhenyao Mo - Intel - Bryan Bernhart - Yunchao He - Microsoft - Chas Boyd (by phone) - Rafael Cintron - Mozilla - Dzmitry Malyshau - Jeff Gilbert Administrative stuff - DJ: Lawyers for Apple / Google / Microsoft trying to figure out a software license to contribute code to the group. Kind of a new thing for W3C. Mostly on agreement and working on final wording. Will likely look like Apache license. - JG: Mozilla would like a copy of the license. - DJ: the companies want to use this for other projects as well, like LLVM (and maybe ANGLE). - DJ: W3C has an all-groups meeting called TPAC. We don’t have a slot to meet there (Bay Area this time), but could get a couple of hours to meet if we like. Don’t think it’s worth it. Could have a session inside the WebAssembly group where we talk about what we want for an API to call from WebAssembly. (Currently WASM can only call JavaScript.) Graphics will probably be the first external thing called from WebAssembly. - CW: depending on DOM interactions discussed this afternoon there might be more stuff too, like mapping buffers inside the WASM memory space. - DJ: will coordinate with chairs. - CW: also signed up for making a demo for W3C attendees. Will re-show demo from Vancouver F2F showing compute + graphics together. Attendees will be folks who are not GPU experts. Status updates - Apple: - Haven’t done anything recently to existing impl in WebKit - Would like to move it closer to what we’ve already decided in the group, and make it clear that it being “WebGPU” isn’t the decision by this WG - Myles and Filip have been doing an experiment to design a secure shading language. Partial implementation in JavaScript done. - Google - Implementing index format in pipeline state that we talked about last meeting - Works - Writing tests, verifying primitive restart on all backends - Intel - No update - Microsoft - Haven’t been writing any code - Have been talking about shading languages and memory barriers - Mozilla - Made big progress on D3D12 backend - Working on GL backend - Figuring out rough spots of descriptor heaps, resource heaps and pipeline barriers - Think desc + resource heaps can look a lot like Vulkan and be efficient on D3D12 and Metal - WebGPU prototype <https://github.com/kvark/webgpu-servo> running on D3D12 and Vulkan! MVP Features - CW: There are things on the mailing list which we’ve ruled out of the MVP - Want it to be enticing, but also not hard to get right - DJ: to be clear, this wouldn’t be version 1.0, and could still make breaking changes (hesitantly) - Enough to convince ourselves and the community that it’s the right direction - CW: also get ideas in concrete form to see what works/doesn’t - Want most things in there that will cause structural issues / changes - DJ: it’s important because we don’t have many facts or much experience writing code or content with the ideas we’ve come up with - Metal 1.0 : the way Apple went about it was to start with a small feature set and add it over time, as we got feedback from developers and hardware changed - DM: less interested in getting developers interested, but rather focus on things that will affect the architecture - DJ: more interested in ergonomics and development. If we’re writing content for WebGPU, if we think it’s too difficult / easy, we can adjust - CB: how do we define the feature set? List of things that have to be operational for someone to be interested? What are the features of the common subset of the APIs we’re targeting for initial version? - Expected HW configs - Whether we’re trying to support “Big Compute” - DJ: Ben Constable suggested trying to get to the point where we can draw a triangle on the screen - CW: we do have different prototypes which do this. But as a group we don’t have enough consensus to develop an API which can draw a triangle. - If you just want to render a textured triangle there’s a lot of structural stuff you can ignore. Like how you get back data from the GPU. But this can affect a bunch of parts of the API. - So let’s focus on structural stuff as well as stuff that’s cool for demos. - Don’t need all the pipeline state like all the blend functions. - But if you don’t specify how to give textures to the shader you have a problem. - MM: makes a lot of sense. Rather than working toward one program, let’s work toward a set of programs. - CW: maybe let’s decide what’s not going to be there? Small list in the email to GPUWeb: - Sparse resources - Transform feedback - Tessellation - What’s left: compute and graphics workloads - MM: do we need a blit encoder at the beginning? - CB: copy engine - CW: probably need that, if only from upload buffers to textures - DJ: don’t include: - bundles / secondary command buffers - stream-out / transform feedback - predicated rendering - tessellation - sparse resources - Roadmap is at https://github.com/gpuweb/gpuweb/wiki/Roadmap - CW: workloads: - Rendering - Vertex/Fragment shaders - Multiple attachments (multiple render targets) (?) - Render passes - CB: definition of MVP is that it’s viable in the marketplace - CB: think people will want G-Buffers for deferred shading - CB: around compute: do we have to support asynchronous tasks? - CW: depending on result of today’s later conversation, may need concepts of memory barriers and multiple queues in the MVP. Whatever the decision is (include or don’t), those will or will not be in the MVP - CB: think we should have async compute. Barriers are a separate process we can determine later. - CW: they’re tied into memory synchronization, which ties into queue sync. - JG: the idea of fences is different than memory barriers - CW: the idea of both, and whether they’re implicit/explicit, is going to be predicated on the result of this discussion - JG: grouping these disparate topics into “synchronization” is too big a chunk - CW: upload and download - JG: memory model - RC: instancing (group: yes especially since it should be easy) - KR: we don’t have to have *everything* from WebGL 2.0. Even with instancing, there are a bunch of variants (base vertex, etc.) - MM: what about GPU-driven rendering (DrawIndirect?) - The three APIs handle this slightly differently - Could be hard / change things structural - CW / JG: should investigate and see how hard it is - Probably don’t need it for the MVP, but if it might affect the overall API structure, should consider including - MM: have investigated it a bit but not enough to talk about it - CW: binding model, pipelines, command buffers (goes without saying) - DM: resource heaps and how they work - CW: we had an NXT roadmap where we went through all of these items - CW: copies / blits - DJ: mipmapping? - Unclear; Vulkan doesn’t have it. Do it yourself - DJ: Metal does have this in the copy encoder - CW: pipeline buffer update? - In the command buffer, say “update buffer with this data”. Inline buffer updates where the data is an immediate in the command buffer - DM: it is convenient and there’s a way to do it in all the APIs - MM: is this for performance? - CW: nothing you can’t do with a staging buffer, and Metal doesn’t have it - MM: let’s leave it out then - MM: don’t need two ways to do this. Can add it later - CW: multisampling? - JG: yes. We have to handle resolve properly. Don’t trust absence of it - KR: could be a can of worms. We just gave developers multisampled renderbuffers and now they want EXT_multisampled_render_to_texture. - JG: isn’t this transitions too? - CB: could we keep the resolve an opaque operation at this level of the API? A high-level call on the resource? - MM: another way to do it would be to attach another texture to your framebuffer and have it auto-resolve - JG: it’s presented more flexibly in Vulkan at least - CB: and in D3D too - MM: probably need at least facilities for it - JG: there are two levels. - MM: should figure out which level for the MVP - CW: think it should be part of the MVP; we need to figure out this story. - MM: who doesn’t want this to be in the MVP? - KR: could be complicated - CB: once we define resources and copying, it’ll be easier to understand how it works - DM: think that adding multisamples should be easy - CW: it’s different in Vulkan and is done on renderpasses if you want to be friendly to tilers. - JG: don’t want to do it magically - RC: is auto-resolve magical? - JG: yes - JG: we should talk about it for the MVP. If implementing is onerous we can re-discuss it - CW: let’s say it’s tentatively in the MVP, pending analysis - CW: memory barriers? - JG: figuring it out should be in the MVP - CW: queries? - timestamp, occlusion - DM: do we have an investigation of them? - CW: not yet. Metal has very few types of queries. Have occlusion queries, but are a totally different concept. - CW: should investigate and - KR: Can we just say they aren’t in the MVP? In WebGL queries are 1 frame behind and people don’t like them and don’t use them. - DM: Can emulate them with pixel shaders and UAVs (and readback) - KN: they’re a little weird in Metal but shouldn’t be a structural change. Should be a separate part of the API. - CW: tentatively out. - JG: shading languages and how you feed them (e.g. vertex attribute marshaling) - CW: pipeline state etc. we all agree should be in the MVP - JG: we’ve already punted on pipeline caching - JG: resource aliasing? - MM: what was the result on heaps? - CW: pending investigation - RC: would say no on aliasing for MVP - CW: that’s my gut feeling too - MM: two ways to use this word. One buffer -> two points in the shader. Or, a texture and buffer pointing at same memory. - CW / JG: we’re talking about the second one. - Meta stuff about the MVP? - CW: Should promise to break it and not enable it by default. - DJ: Helps with security. In Safari TP you can enable / disable features at runtime. - DJ: Hardware we are targeting is essentially anything which runs Metal - For Google most of Android devices which ship Vulkan - For Apple, any Metal 1.0 device (nearly all iPhones ATM) - Some smaller subset of Mac hardware excluded that doesn’t have Metal. - https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf - For Microsoft: all D3D12 devices - If you have the 12 API, it can be used, so we should target it - CB: lowest-end DX12 capable system will be as tough to target as the lowest-end Android phone. So not clear that D3D12 will be defining the floor - CW: suggest that for MVP, not guaranteeing that it’ll run on all of these systems - DJ: on Android Vulkan’s been supported for two releases, but first release was a bit shaky. - CW: Android still only requires GLES 2.0. However, of the Android devices which ship Vulkan, we’d like to support most of them. - MM: does every device which has Vulkan also have to pass the Vulkan conformance tests? - CW: have to pass the CTS except for some disabled tests. But some of these devices shipped on CTS versions that were incomplete, so there are bugs. Some have iffy Vulkan devices. We’d have to do workarounds. - JG: we’ll try to run on Vulkan machines, but we won’t hold back MVP (or change the API) for broken devices. - CW: Vulkan does have a lot of optional features, including logic ops (!). Might need to ask to remove certain features from the MVP and recast as extensions. But extension story should be a post-MVP thing. - KR: point out that Vulkan’s extension support was structural -- the optional void* extension pointer at the end of each struct - JG: agree, should have one no-op extension to understand how they’ll work - CW: MVP’s content is important. Do we care about the API shape? - JG: yes - CW: ok, then a dummy extension should be there - KN: not clear how easy it will be to agree on an API shape - MM: shading language for MVP should probably be shading language we progress forward with - Should also decide whether we’re going to make a JavaScript API, WebAssembly API, both, or neither. :D - DJ: definitely have to have a JavaScript API. Question is whether we have a C one or not. No way to call C from WebAssembly. - MM: we should have a discussion and there’s one right answer. - MM: have been keeping the roadmap document up-to-date to the best of my ability Memory barriers - CW: How do we get to a resolution - BJ: Trial by combat - DJ: Is there any way we can split it up into a smaller discussion? - We’ve already discussed the philosophy previously - Apple’s perspective: we think a much simplified API with the implementation doing more work is the better solution - The other solution says that it’s good to give the developer all this control - What this would mean to a Metal implementation: a bunch of stuff would be no-ops because it’s handled by the implementation. And the no-ops wouldn’t slow things down. - Thesis for a Vulkan implementation: a Vulkan implementation would have to do more work, because the Metal driver’s doing it; but think that we can agree upon an optimal design that will give “good enough” performance. - DM: clarification: we want both source and destination of transitions to be specified. That’s the way Vulkan does it. - CW: we think memory barriers need to be explicit for many reasons, so we should expose them to the app developer. But they should be declarative and failsafe. - I have a resource, treat it as an assembled image, or as a vertex buffer. (This allows barriers to be grouped.) - In other words, specify the destination state. - It’s a D3D12 transition barrier with only the destination stated. - DM: ah, so it’s a bit higher level. - Impl should do whatever is needed to make that happen. - Avoids developer needing to understand what the memory model means. - If the developer does it wrong, they’ll get a validation error. - It’s a simplified model, and will map well to all backends. Easier to validate, easier to learn. - If we validate strongly memory barriers, then WebGPU will work seamlessly across desktop and mobile. - If we *don’t* do this, developers will make things that work on desktop but *not* on mobile. - RC: spoke with members of the D3D team. - D3D11: very explicit. Bind everything ahead of time. - API had all the information it needed to do the barriers for you. - Shaders: indices into arrays had to be constant. - In the new bindless world, you can’t know that. In the shader, your array indices are not constant. You can calculate the index, index into the table, read from here, write there. No way with that model for the runtime to figure out what you’re reading from and writing to. - Had to add memory barriers. - For this reason, we think that version 1 of the API should have memory barriers, to set ourselves up for the new bindless future. - If we have dynamic indices in the shader, don’t think we can figure out what’s read and what’s write. - CW: what does this mean in practice? - RC: this means we need to make barriers explicit. - CW: are the barriers validated for correctness? - RC: think it will be very difficult to do. D3D lead said, if you can figure out a way to validate them, then the API should auto-do it for you. - JG: there are different degrees of validation. Can ensure something’s safe without ensuring it’ll be completely portable. - RC: if they’re auto added for you, is it twice the cost? 3x? - CB: barrier model we have in DX12 is slightly higher level than the one that’s in Vulkan. But lower level than what’s in Metal. Trying to understand how this ties into goals of resource binding model. - DX11: max 128 textures bound to single pipeline, statically indexed at compile time. - New APIs: can compute that index inside the shader, and it can go up to ~1 million textures. - CW: don’t think that’s the case in Vulkan. Vulkan still caters to fixed-function hardware. Bindless isn’t mandatory. Easy to change part of the bindings. Don’t think you can access millions of descriptors like D3D. - CW: Metal is D3D11 style. Has bindings, not bindless. Has dynamic indexing, but it’s a texture table. - RC: the D3D team’s conclusion was based on having bindless. If we don’t have this then maybe it is possible to auto-insert barriers. - With advent of multiple queues the runtime can’t insert them for you. - CB: example: CopyEncoder, then texturing from it. Have to signal that the copy is done. - DJ: Do you mean only for async copy / compute case? - CB: yes, if it’s in a separate queue then it’s asynchronous. - DJ: thought we’d agreed there’s only one queue? (CW: no) - DJ: can be potentially asynchronous in Metal as well. Metal has explicit barriers. It’s about the encoder, and inside compute. - CW: probably inside blit as well. Just a bit simpler. - MM: not sure about that. - RC: so for compute, do you believe in explicit barriers? - DJ: yes, we believe in explicit barriers for some cases. - CB: within a particular encoding stream. - MM: render a triangle, then a second triangle, in the same encoder. The second triangle has to appear on top of the first. - JG: other commands and dependencies can happen at different times. (?) - In Vulkan you can have things run in parallel - MM: write to texture in fragment shader, then read from that texture, it’s not defined. Would need to end the encoder. - CW: in Metal barriers are inserted between the encoders. A Vulkan subpass corresponds to one Metal RenderEncoder. - DM: it’s not 1:1 - CW: if you need to do a barrier between Vulkan subpasses, then you’d split the Metal encoder. You don’t have barriers inside subpasses. - DJ: that’s what I meant about explicit barriers in Metal. - MM: what we’re really talking about isn’t whether there should be barriers, but what should the programmer describe when they need synchronization. - CW: think we can agree that we don’t want any form of barrier inside subpasses, because that’s impossible to implement. - JG: subpass self-dependency - CW: limited what you can do in there. Vertex UAV writes -> Fragment UAV reads. - JG: thought this is what could help the tiler - CW: in Vulkan, can have dependencies between subpasses. - CW: in Vulkan, can only push data from vertex to fragment inside a subpass - Don’t know any use case for it. Would be ready to not have that. - JG: Vulkan spec section 6.6.1 about dependencies - CW: think this too hard and niche, and we shouldn’t put it in. (On a tiler GPU, putting barriers between vertex and fragment processing without flushing the tile caches) - RC: so you need to close the pass and open a new one? - CW: yes. Because Metal and Vulkan are catering to tiled GPUs, have to be explicit about when rendering to a certain attachment set is started and ended. If you want to read from the attachment, it’s required that you can’t do so from the same subpass. - CW: it’s more like UAVs where you write to it from the vertex shader and read from it in the fragment shader. Can’t think of a use case for this. - CB: don’t think this works anywhere. - CW: might work on tilers? - CB: read-after-write hazard. Plenty of things people do after a render and they have to switch layouts. Very common hazard. - RC: so, it’s a hazard to write to a UAV in a draw call and read it in a different one. - CW: Vulkan subpass self-dependency. - MM: one thing we can all agree on: shouldn’t be able to write to a UAV from a vertex shader and read from it in the fragment shader, in the same draw call. - CW: to close this topic: we don’t put memory barriers inside subpasses. If you need this for your use case you do a different subpass. - RC: we do agree on some kinds of barriers! - JG: I am curious why this wound up in the Vulkan spec… - MM: until that question’s answered, we should proceed - RC: rendering to a texture and reading from it requires a new subpass? - CW: yes - CW: this would be implemented as OMSetRenderTarget in D3D12 - MM: next topic: during the boundary between one renderpass and the next, what should the programmer say? - CW: in Vulkan, when you have renderpasses with multiple subpasses, for each attachment, you have to say how it is used when. - CW: redundantly-ish, have to say what are the memory barriers between subpasses. - The membars between subpasses can be a superset of transitions - Can say: want buffer writes done in that subpass to be visible in this subpass - In addition to transitions of textures - CW: if we support renderpasses with multiple subpasses – which I think we want because it’s very handy in both Metal and Vulkan on tilers – then when we create the renderpass describing the rendering algorithm, we need to say “I want the shader writes done here to be visible over here”. - CW: otherwise on Vulkan we have to guess, and take the worst-case guess, leading to a pipeline recompile. So we really need them described. - CW: between subpasses, need to encode which memory barriers it might require. - MM: and if the app gets it wrong? - CW: we should find a way to validate that. - MM: the validation compares “expected” vs. “real”? - CW: in renderpass: app says, at this point, i want to be able to have resources that go from “shader writes” to “being sampled”. Each resource, it says it does that. - MM: if you have information about what the app’s doing: then you can retroactively insert barriers immediately? - CW: no. Renderpasses encode memory barrier information. Pipelines are compiled against renderpasses, and can only be used in compatible renderpasses. - DJ: can’t change the renderpass after it’s been “closed” because that would cause recompilation of the pipeline? - CW: yes. - MM: in order to make that work you’d need to wait until the end of the pass - DJ: if you were going to do it automatically: you’d need to record the commands, submit everything, and then … - CW: wait until everything’s done. Build renderpasses. Then recompile pipeline. Then encode command buffer. - JG: why do we have to recompile pipeline? - RC: is the recompile needed on D3D12? - CW: no. would not need to recompile pipelines. It’s not as bad as on Vulkan. - CW: memory barriers between renderpasses change. Renderpasses are compatible if they are the same in everything but the initial layout of resources (framebuffer swizzling, etc.), and load/store operations for different things (in Metal) - CW: if you’re on a tiler and you have to flush the tiler, you want to take advantage of that for register allocation on your tiler. - JG: how would you have different initial and final image layouts? - CW: high-level point: if we have multiple subpass renderpasses, there’s some implicit memory barrier that’ll have to be inserted. - DM: don’t understand how this can be done automatically on Vulkan. user can not communicate to driver what to do. - JG: would have to infer dependency graph from what was submitted - DM: several different ways to do this. Not clear. - JG: in metal you encode things in an order. Things happen in that order. Things that aren’t dependent can happen in arbitrary order. - MM: in Vulkan things aren’t ordered? - KR: it’s a render graph. Some parts can run in parallel. - KN: you can insert them in whatever order - CW: it’s like you provide your rendering graph to the driver, and the driver optimizes / schedules it. - KR: engines are representing their frames as graphs internally already. Frostbite: https://www.ea.com/frostbite/news/framegraph-extensible-rendering-architecture-in-frostbite - CW: metal only provides one attachment at a time. - KN: in Metal you have to give things in the right order. In Vulkan you can submit in any order but have to provide the dependencies. - CW: Vulkan’s way of it lets you do register allocation of the tile cache. - JG: don’t see the distinction. All the APIs have dependency graphs. - CW: you want to understand exactly where your data ends up in the tile cache. Metal doesn’t have information about the pipeline when submitting. - CB: suggestion: - given that there’s some diversity of the use of the term “barriers”, might be interesting to look at the top 3 or 4 use cases, see how they’d be implemented in each API, and see what abstractions would work - look at things like RAW, WAW hazards - Merging the APIs without that use case context will be a long tail operation - JG: concerned we might miss use cases - MM: related question: have decided there’s at least one case where the app is wrong. Where if you write to a UAV in vert shader and read from it in fragment shader, that’s undefined. What happens then? How does the browser know that this scenario occurred? - CW: don’t think we can validate that - MM: then we have unportable apps - CW: don’t think we can shield against concurrency bugs when we have read-write buffers - MM: would annotate every buffer. all buffers attached to vert / frag shader. - CB: in DX11, we can validate this, fail and unbind the previous bind to the pipeline - B/C we have indexing in the pixel shader in D3D12, can’t validate. Have a debug layer. Instruments the shader at runtime. Warns the user that that’s an illegal operation. - MM: think this sort of analysis needs to be done for every draw call by the browser - CB: what we’re looking at is a model where we don’t support arbitrary indexing. So we can do the D3D11 validation model. - CW: app allocates a big buffer. Read-write “stuff” in vert shader in one part. Read-write “stuff” using frag shader in another part. - CB: APIs don’t allow this today. Problem you run into is that segments of that buffer have been cached with different granularities in different ways. - JG: swizzling patterns for tile subsections - CB: these are properties of the resource description - KR: that big a limitation to say you need two different buffers for this? - JG: would be different from Vulkan - CW: would be fine from our point of view; slightly limiting - MM: think we should eliminate undefined behavior - DM: you already have this with just a single UAV. fragment execution is unordered. - MM: what if you have only a single thread? - CW: works then. - KR: or if you use atomic ops - CW: we simply can’t verify shaders with data races. - Discussion about serial submission vs. parallel submission - KR: is it the same as topological sort used by compilers to linearize graphs? - DM: don’t think so. better to submit the graph. if we establish the order, then we limit the amount of reordering and rescheduling the driver can do. - JG: it’s sort of about identifying hazards. - CB: I’m a big fan of task graphs. Covers all 3 APIs. Devs used to graphical abstraction can author this almost with a markup language. - CW: aren’t renderpasses that task graph? - CB: yes, kind of as a tree. Or sequence of sub-graphs. - JG: it’s an incomplete graph. - CB: a lot of engine companies are looking at a task graph model. The top level of their engine is already a task graph model and they’re looking for a more direct mapping. So a task graph in the API would not preclude using it in AAA content. Or Unity. (ooh, burn) - RC: ? - CB: so if we express things at a graph then we don’t need barriers and we can use the graph to express dependencies - RC: so all the dynamic UAV stuff would have to be inserted into the graph? - CB: not sure we can say that there wouldn’t need to be some kind of “UAV barrier” inside the shader - CW: do we want to minimize undefined behavior? - JG: we have different concepts of that - MM: we as a group shouldn’t pursue eliminating undefined behavior as the only goal of this group - But, it is valuable to limit undefined behavior - It’s not the only goal, or the most important goal. - CW: we should minimize undefined behavior at the API level, while staying at our perf target of 80-90% of native. - KR: but not, say, a factor of two hit. - Discussion about Vulkan’s requirements that: - Renderpasses: get (attachment descriptors, subpass descriptors, subpass dependencies) - Pipeline descriptors get Renderpasses and subpasses - Then BeginRenderPass gets the renderpass and textures - There’s needed compatibility between renderpasses and pipelines - How to make progress on these - Would like to get some use cases and understand how they’d be implemented in Vulkan (and other APIs) - MM: use cases are good. They won’t be comprehensive. - JG: think we made a bunch of progress here Multiple queues - CW: ties in to this topic and will be just as contentious - In the roadmap, we have consensus on queues such that: - There should be one queue type that can do everything on all APIs - Some implementations may support multiple queue types - It’s not clear whether we can have more than one queue per type - Not sure whether we should force all impls to have multiple queue types - MM: Metal doesn’t have synchronization between multiple queues - We agree that we need synchronization between multiple queues - JG: in Metal you can get callbacks when queues are done - MM/all: but that’s round-tripping to the CPU - MM: regardless of once per frame or a few times per frame, you have to round-trip - If you’re going to have multiple queues, you’ll probably require synchronization without round-tripping - Metal doesn’t need this because they only have one queue - If the implicit dependency graph is that the things can run in parallel, and the GPU has facilities, they can run in parallel - CW/JG: discussion about multiple queues and fences - CW: you’re not intended to use multiple queues in Metal, because the synchronization is through the CPU. In Metal, if the driver discovers you can take advantage of parallel hardware queues, it’ll parallelize it. - Async compute happens automatically-ish. - JG: understood. - MM: in metal there’s no reason to use multiple queues. The fact that you can make multiple queues is just a natural thing. But they’re not designed to be used. - CW: there’s device submit. Queue submit is queue.device.submit. - RC: how do you specify the dependency graph? - MM: it’s implicit. As described during the last meeting. - Ex: blur something just drawn. One RenderEncoder which draws the thing. Second ComputeEncoder which lists that the texture you drew into is a readable input. Dependency graph is implicit. - JG: do you think we should have multiple queue instances in this API? - CB: basically asking whether the app should say what can run in parallel, or the API should determine what can run in parallel via specification of dependencies - CW: if explicit, then API has to include queue synchronization facilities (on the GPU – no round-trip to the CPU). - DM: Metal backend could say that it only has one queue available so that it doesn’t have to implement synchronization. - KR: can we support async compute in Vulkan without making everything explicit? Like Metal? - CW: you have to declare which queue type things can run on - JG: two types of objects in Vulkan, shared and exclusive. Shared can be used across multiple queues. Exclusive have to be transitions. Can transition sub-parts of objects to run on different queues. - DM: “concurrent” and “exclusive”. - DJ: async in Vulkan: different queue type / instance. - CW: one instance “graphics/compute/blit/present”. another “compute/blit”. Do main rendering on first one. Async compute goes on second one. - CB: motivation in DX12 was: get the drivers out of the business out of analyzing command streams and determining what was parallelizable. But at this level of abstraction that doesn’t seem like that much of an issue. - JG: would be nice to retain the benefits. - CB: if we put this all in a single ref implementation we can all optimize it ourselves. Can provide optimization of Metal behavior in a way we’re all comfortable with. - JG: can a ref impl be good enough that we’re satisfied with doing it automatically? - CB: not sure there’s much value to be added with letting the app do it. It’s just that we’ve seen arbitrary cost in some drivers. But if it’s our ref impl then we can do it. - MM: no one way to do it right? - CB: Metal team seems to have figured it out. - CW: intuition: if we make queues explicit, think apps are more likely to take advantage of them. - RC: so in Metal you have to tell the encoders what your inputs are, so it can figure out that the compute stuff can go in parallel? - CW: they’re provided when you say SetFragmentBuffer - MM: when you create the encoder you don’t say “I’m going to use these resources”. But at the time you list them you’re using them for what you want. - When you describe you’re going to use this texture for this purpose it does 2 things. Attaches texture to shader. And says that synchronization is needed. - RC: and if you say i’m just going to run this, then can run in parallel? - MM: yes, if you have a compute thing with no buffers and textures attached, then the compute thing could run entirely in parallel. - Rendering algorithm with two textures as input, both filled via compute. Those compute things could run in parallel. - DJ: given you have to express the deps up front, why do you need a separate queue? - JG: section 6.2, sync guarantees. Submission on a single queue is implicit - MM: no. first thing finishes before the second thing *finishes*. - CW: also, can’t put compute in render passes in Vulkan. - MM: dean’s question is why this is required. - CW: dumping sub-parts of the graph which are graphics-only. - MM: that’s a bad design. why? - CW: tile cache might be using compute shared memory - JG: this might be a concession about using a single queue without working about command buffer sync - CW: maybe a concession to console developers and they want full control over the hardware. maybe they have a task graph but want explicit control. - KR: it might be worth trying to do this automatically - CW: sync between queues has a cost. If you have a tiny compute shader used to generate a DrawIndirect, and it has sync and what not, it’s not worth to put it async. We can’t know the cost upfront. - MM: there’s a cost to marking a compute shader ‘expensive’, and submitting it to a separate queue - CW: seems easier to have the app tell you to run the thing in parallel. Doesn’t necessarily mean that we expose the concept of queue, but the graph needs to be specified up front. - MM: that seems easy to agree to. “This computation could possibly be asynchronous”. - KN: not necessarily one compute op. Maybe multiple, and have to be ordered w.r.t each other, but async w.r.t everything else. - MM: the app submits to different queues, and you have your async compute. At end, want to join them and show frame. In Metal you can’t do that. - CW: in Metal, you’d have compute and render happening in parallel. RenderEncoder A, ComputeEncoder B. RenderEncoder C, renders to final render target, and implicit dep on both. Submit both in any order, and Metal figures out A and B can run in parallel, and have to join for C. - JG: if you have an active pipeline then you could make the pass-back to the CPU to establish this - MM: if you have things that are sharing the same buffers, then in Vulkan one goes to one queue and one to another. In Metal, could easily get into a place where you deadlock because the ordering is wrong. - CW: yes, RenderEncoder A using a buffer, RenderEncoder B using the same, and they won’t run in parallel because they might race. - MM: opposite. App puts one in one queue and one in another. - CW: app can’t do that without inserting transitions of resource from one queue to another. Using resource for writing in two different places. Invalid. - DM: exclusive ownership for resources? - CW: yes, that’s my view – just a proposal. A resource is either readable or writable as one specific type of thing on one queue. - MM: a bit blocked. Higher level? - CW: at any single point in time, a resource is either readable by the world, or writable by only one queue. (This is just a proposal for eliminating undefined behavior) In the backend, would put in synchronization (in vulkan – in metal would no-op) - MM: in the one Metal queue, you’d first have to submit the commands – the command flow has to follow the resource. - CW: on the app side – resource is used first for render, then compute. Submit render command bufs using resoure. Transition rsrc from queue render to queue compute. Now can submit command buffers that use rsrc for compute. In Vulkan, would use a fence. In Metal, each time you do submit, create the encoder, so things are well ordered. - KR: is there something sub-optimal for Metal here, where we have to defer things to queue submit time? - CW: when you encode a command buffer you’re putting it in the queue. You have to encode things in order. - MM: no, you don’t have to encode things in order, but commit them in order. - KN: thought you had to commit one encoder before you got the next one. - MM: drawing use of buffers on different “queues” and different dependencies which would cause deadlock in Metal but not Vulkan. (B/A, A/B) - CB: deciding whether we need explicit parallelism. - MM: suggesting this is impossible. - CW: the Metal driver’s doing dependency analysis. That would be really bad in the backend. The driver’s signed up to do that, but not the backend. - CW: in Vulkan, when you do queue submit, things have to be transitioned into the right state. - MM: so both vulkan and metal will have to validate this scenario? - CW: that’s validated if you have explicit transitions. Shading language Fil’s presentation - Discussion about various topics - DN: there’s a lot of content for GLSL. Let’s say you added generics and slicing. slicing looks like the killer app for this. If I were to explain this to someone in the GL world, it’s a nicer GLSL with slices and templating. - CB: in the HLSL we’re working on adding these to the cut-down version of C++. One difference is we’ve had unions on our plate for a while. Know we’re not going to be able to implement all this on the GPU. - DN: OpenCL C++ kernel language has templates etc. For C++ people but removes much of the dynamic stuff. Still keeps the OpenCL C pointer restrictions. - DJ: you wouldn’t need logical addressing mode restrictions for OpenCL. - DN: we’re all going after the same GPUs. - CB: we’re all trying to go after the C++ model, but going after the same hardware, as the hardware evolves. - DN: want to separate programming model concerns with technical concerns. - DN: still not sure what the security model is (bounds checks, etc., at what time do you detect that) - DN: you’re creating a new language and asking everyone to move over - CB: but they’re not breaking changes. - CW: WSL looks like a subset of HLSL - CB: not into the whole branding a language for the sake of it - FP: WSL is C++ without classes. We gave it a name just to have a name and a directory to put it. It’s the kind of language you can tell someone who knows C++ that “here are the rules”. You mentioned no clear story of how you handle errors. MM and I came up with a thing that WSL will do: program terminates early. - DN: we hadn’t agreed as a group what the criteria are. - DN: the generics you mentioned are template based, so you’d wind up with e.g. 5 copies of the code if you had 5 different instantiations. - FP: Yes but because of inlining you would have 5 different instantiations anyway. - DN: have talked with people who have significant codebases that say if you have that genericity at compile time, you wind up with unacceptable performance. They do dynamic polymorphism. When you access memory you change how the load is done. Have heard this from multiple directions. Might be the kind of thing you say, sorry, frontend has to handle this in some way, even if it causes code bloat, etc. Maybe a concern, maybe not. - FP: the problem is inlining, not generics. If you allow a shader language to have functions then you ultimately have to implement that in the language by inlining. - DN, KN: that’s not true. - KN: has to be inline-able. But many platforms will not actually inline it, because you’d end up with too many instructions. - CB: What we are looking at doing is maybe having a link step that does dead code elimination. - DN: the problem is that dynamically, at runtime, you might have 1 of 100 different things - CB: it’s very dangerous to make 100 copies of cide - DN: High level point is: people who see “I need pointers because XXX” don’t want instantation explosion but just one code path. The model presented for WSL doesn’t help for that because it still has the code explosion problem. - FP: understand what you’re saying. Valid concern. Data point: people are using templates in Metal and they’re happy with them. The reason why it’s kind of OK is: killer app for templates are killer numeric code. This is what people use templates for. If you’re trying to write OO code using templates, it’s hell. - DN: some customer of yours said “I’m using pointers because blah”, and you create this solution, but you may take this back to that customer and they’ll say “it didn’t solve my problem”. Maybe you’ll go back and do more work in the compiler. - CW: question not related to language design: what’s the delivery mechanism of the language to the API? - MM: it can be whatever we come up with. - CW: is it a goal of this language to be faster to type and safety check than SPIR-V? or to be a High-level language to be accepted and lowered to SPIR-V? - DJ: we think it’s not going to be significantly slower than type checking - MM: if your Q is “what language does our API accept” then the model is that our API accepts WSL. - DN: so this is what’s being proposed to WebGPU. - KN: the only reason to add safety to WSL is because you can add security checks more intelligently than SPIR-V. If our thing injected clspv and opencl c with a restricted set of opencl c and we could type check it. Only reason to add a new language is to more intelligently add safety checks. - DM: have we seen a case where WSL would be safer than SPIR-V? - FP: have the safety checks that are minimally needed to add memory safety to SPIR-V been added so we can check them against WSL? - DN: haven’t spec’ed them fully. - CW: seems buffer checks mainly. There are also texture image fetches, and you have the texture size available at the call sites. Feels like the biggest safety feature is buffer checks. - DN: spir-v buffer fetches have been deployed to date on platforms where robust buffer access is present. - MM: need to handle platforms that don’t have it. - CB: it gives you better access to safety - KR: point about needing run-time checks at all accesses of slices in WSL - KN: question about doing the checks up front - FP: if there were no rule about checking slices up front then in the pointer case you’d be unsound. If I could create an array slice that’s pointing out of bounds then a subsequent checked access might go out of bounds. In logical mode I don’t think there’s a significant cost of bounds checks here. - DN: you’re checking the object you’re referencing into. But you’ll reference into the slice with a run-time determined value so you will have to check it anyway. Effectively you have a fat pointer that you’re passing around and you have to check the index. - FP: are you talking about logical mode or not? - DN: yes - FP: the reason why slice creation has a bounds check at that point is that if you have totally unconstrained pointers, it’s like i’m giving you an inductive hypothesis that that slice is valid. But need to check that the slice is valid up front too. - DN: so guaranteeing that base object is valid. Now have an arg index which is some number. Need a bounds check. It’s the same bounds check that you’d need with SPIR-V logical mode. - CW: what’s the value of the API ingesting this language, vs. ingesting SPIR-V which can be a compilation target of it? - FP: 1. textual format and not a binary format. we think based on feedback from webassembly that future programming formats for the web were textual so view source works. - FP: 2. this language already has specified type rules for areas that have security implications. it’s designed for security from the start - FP: 3. as we discover how the webgpu spec is supposed to work (constraints it runs into, etc.), having a language that this committee owns that doesn’t require approval by another committee will give us flexibility that we need. - CW: rebuttal: - 1. view-source problem: we don’t need a standardized textual format for this. can view webassembly on the web right now. can view spir-v disassembled right now. no-one writes this. (CB: view source isn’t useful in that case.) - DJ: the feedback we got from teams that write shaders is that they want a human writeable format. - DJ: so you want to ship a spir-v compiler along with your source? - JG: then ship a compiler - CB: compromise: spir-v could well be the underlying implementation of this. but if that’s all you spec, big q about ease-of-use of programming. but you could get a rich diversity of compiler languages, so no sharing of code around the internet, defeating the purpose of a w3c standard. we could do this and make a new platform for people to make new languages in. but best to have a lingua franca with a syntax that is supported on every browser. - FP: i wasn’t describing the lack of view source. i’m describing criticisms from developers about lack of view-source. - KN: understand. they want the originally written source code. - CB: they need to round-trip it. - 2. spir-v logical addressing mode is a feature of spir-v by default and it’s secure. theoretically you’ll be generating valid spir-v. can run spir-v validator on it. it’s great to have a HLL to embed the security properties, but it doesn’t make it better for ingesting by the API. - FP: is there a reference implementation enforcing the security properties you suggest? - DN: the spec says exactly what logical addressing mode operands can be. - FP: doesn’t say what the bounds check behavior is. - DN: are you asking for an implementation which checks validity of program that’s running? or statically, plus a certain number of runtime checks? - Discussion with DN and FP about gluing SPIR-V spec to GL or Vulkan spec - JG: super impressive that you’re creating a new language. disagree on the need for it. we have 95% of a solution in front of us in the form of spir-v logical addressing mode. we already did this for opengl and glsl for webgl. - DJ: how many languages are compiled to SPIR-V logical addressing mode? - HLSL. GLSL. OpenCL C subset. - CW: the reason to take SPIR-V is: high-level shading languages have corner cases. - DJ: but the security researchers came in yesterday and showed us bugs in SPIR-V drivers / compilers. - CB: he’s showing us how to add pointers and templates for limited use cases. would like my ide to guide me along a path where we implement these new features robustly. - CW: perhaps we haven’t stressed enough that this is a great experiment and something we have. we’d like to use this to write shader code. but it’s a question of what the api ingests. - FP: we are arguing that the api should ingest a textual format, and that it should be designed from the ground up to meet the needs of the web, and that this format is something that’s owned by this committee. no matter what we pick there’ll be some friction between the language and what this committee is trying to do. the language has never been used by anybody, no backward compatibility constraints. new language is an asset. also, saying 95% of spir-v is described is not true. - JG: there’s similar prior art with making GLSL secure. - FP: this could be viewed as an extension to GLSL. - CB: in this modern world with tons of github repos, it’s better to have human-readable. - DJ: another useful feature: making a textual language easily translatable to the lower level languages. all the platforms are in the same spot, requiring translation. and models the webgpu api because we can control e.g. the bindings. we’ve seen this in metal. - CW: yes it’s great to have debugging, and can have that by not strippign all debug info from spir-v. - KR: Big pieces missing: analysis of current shaders and kernels that will be ingested from the system, and, limitations on the underlying shading languages (no 8-bit load and stores). There are low-level limitations that bubble up to the high-level language. - CB: ??? - KR: Can’t come up with something from thin air that isn’t grounded in the limitations of all the targets we have. We know we will need to inject bounds check just like in WSL. Concern we are going to throw out all prior art, and all prior kernels (need HLSL to WSL). - FP: Not true currently the spec is a JAvascript interpreter. Then compiler to SPIRV, then SPIRV to WSL. We will prove isomorphism. We think it is very grounded in prior art. - CB: Since it is isomorphic then it doesn’t prevent people from reusing their kernels. - KR: - DJ: If we have isomorphism between WSL and SPIRV then HLSL and GLSL work. - JG: Awful lot of work when things already work. - CB: Lot of people interested in pointer and templates. Think improvements to the shading language is part of the features of WebGPU. - KR: - DJ: And without robust buffer access. - FP: Need to go, think there is a lot of info, need to take some time. Provide slides and code. - FP: what it’s going to need to secure spir-v will be useful no matter what we end up deciding. - DN: are you going to show your slides to your customers who requested templating and see whether it’s what they want? - JG: one more request: could you choose a different name? WSL is already a well-known name on Windows (Windows Subsystem for Linux). - BJ: clarification: you want textual format for the web. but webgl developers have said they want binary shaders. is your plan to ship spir-v and back-translate to WSL? - DJ: first one. also: why do they want a binary format? refuted by disassembly arguments earlier. - DJ: also we have webcrypto which will really hide their content if they want. - MM: it’s designed for the parsing and type system to be easily decidable. - KN: the web developers who say they want view-source are not the same as the people who want to load large amounts of shaders quickly from the web, compile quickly, avoid as much compilation time as possible. think loading performance is better than view-source. - DJ: we agree. and we think WSL will compress well. we think the compilation, parsing, loading time will be fast. the speed here is important to us. if we took spir-v we’d still have to parse, translate, convert to metal. spir-v won’t give us an advantage here. - CW: spir-v is designed to be the receiver of many languages so that you can efficiently compile spir-v to native targets. if it’s isomorphic to spir-v then why not ship a wasm module that compiles wsl to spir-v. - DJ: would be cool to see someone writing a shading language that has these templates, etc. and compile to SPIR-V. - BJ: during vulkan development, was desire for bytecode languages. devs pushed back because people said people would expect it to load fast but it wouldn’t. the existence of spir-v is probably that there was a practical benefit to it. is spir-v more quickly consumable? - DN: speed of loading was a non-goal. spir-v is intentionally high level to avoid premature optimization. - CB: from a practical perspective: every language is a derivative of clang and every IR is a derivative of LLVM IR. LLVM IR is probably one point in the process of this compilation step and the language is likely to be a cut-down version of clang, and the question is how far we cut it down. would like developers to have the options. - DN: spir-v is deliberately distinct from LLVM. It was a mistake Khronos made – twice – and a mistake multiple teams within Google have made – to tie themselves to LLVM IR. When we designed the original SPIR, we deliberately avoided basing it on LLVM IR. - CB: SPIR-V, DXIL, LLVM IR, etc are all pretty similar. - CB: need to decide whether we have a high-level, low-level, etc. approach. - WSL / WebGPU Shading Language https://cdn.rawgit.com/webkit/webkit/master/Tools/WebGPUShadingLanguageRI/index.html - MM: demo MM: This is a compiler that creates an AST and evaluates it by visiting it in Javascript. - MM: If you are interested in WSL there is a live version of it. - CW: should clarify the discussion about shading languages into: ergonomics, security, etc. and have different AIs for different people. - DN: wanted to ask FP to show preso to key customers. Think there may be some resistance that the thing the customer wants (pointers) wasn’t resolved by this proposal. - Depends on the customer. - DJ: ok, so we need to find out whether customer’s requests for pointers and generics were satisfied by the WSL constraints. Might be talking with a set of customers who are different from your (Google’s) set of customers. - DN: I heard a request from Filip about how much work there would be to secure SPIR-V. We should take an AI to enumerate exactly what we mean from secure. Namely, that access to buffers and images are checked. Robust buffer access in a software implementation. - MM: if you want to use the SPIR-V spec you need to look at 2 specs. - CW: you have to look at the environment spec. - DJ: if we accept SPIR-V we need an environmental spec for WebGPU. - CW: that can be our (+ Mozilla’s) AI. - DJ: also, going to do prototype investigation in validating that SPIR-V is secure - DN: namely, that you can make sure that ingested SPIR-V validates, and that runtime checks are injected. - CW: need a validator, and a SPIR-V pass which adds bounds checks to buffers, image fetches. These are super easy. - DJ: we will take the AI of cross-compiling WSL to and from SPIR-V and MSL and will write down any snags we run into. - DJ: we were going to go around the room and did a straw poll. - CW: we should talk with Khronos about adopters’ fees for SPIR-V. - What are the implications of using SPIR-V in WebGPU? - DJ: seems fairly clear what Apple’s position is. We wouldn’t be working on it otherwise. To reiterate goals: based on what we thought were the requirements from the group and our own goals, it’s an investigation into a solution. We think it’s valuable and the right thing to do. If the group is really strongly pushing for SPIR-V, we want to know answers to questions. - DN: personal opinion: WSL is a great investigation to move the conversation forward. We haven’t pinned down enough of our own requirements to recommend in a convincing way what is required by the API. Also, if we’re serious about an MVP, SPIR-V gets us a long way along quickly. - CW: strong intuition that SPIR-V is the right answer. - KN: aside from everything from security, don’t see a benefit of WSL over SPIR-V but need more investigation. - DM: we should build and focus on the platform. From that point SPIR-V makes more sense. Like the language, but would slow us down to build a new high-level language. - JG: share Corentin’s intuition that SPIR-V is an efficient valuable way forward esp. given the experience we gained from WebGL 1.0 and 2.0. Not impressed with the motivation for starting something completely new, when we have something that’s close to matching what we need. Surprised that this was as contentious as it’s been. - RC: Chas already summarized. Agree with Apple that a textual language is important for the web. It’s been a tenet of the web that all you need is a text editor and web browser. Think we should use HLSL as the language. Chas has been open to standardizing it with W3C. Have recently taken contribs from SPIR-V group to have HLSL frontend, and getting SPIR-V folks access to HLSL repo. In other words, HLSL isn’t just controlled by Microsoft. HLSL is used by every Xbox game ever written, etc., and it’s been battle tested on a large body of content. But we also want to see innovation in the language space and think SPIR-V could be something WebGPU could reasonably accept. So at the low level think it would be better to ingest SPIR-V instead of DXIL. - KR: It was a very nice investigation and great motivation to make better high-level languages. Think it is too early to throw out all previous solutions, felt the presentation was dismissing prior art and core issues that are in the low-level languages. We will have to have WSL running on all platforms before we can choose it. - CB: Concern about breaking existing code with SPIRV? WSL is closer to GLSL and HLSL than SPIRV. - KR: No concern that WSL would break things, SPIRV already ecosystem to compiler from and to HLSL GLSL (and to MSL). NXT shows that SPIRV translates well to HLSL GLSL and MSL. Why not have WSL to SPIRV translator early and have things running instead of writing many WSL backends. On our side we should write the “security layer” for SPIRV. Could put that in NXT then run tests on all paltforms. - CB: so put SPIRV validator in all browsers> - KR: Yes, WSL would be the same where you would have the compiler + validator + translator in all browsers. - KR: Think it is premature to choose a new language that hasn’t run on any GPU yet. +WSL is high level, our experience in WebGL is that native GLSL compilers were all broken. Should go with something more battle tested which is the SPIRV toolchain. Should choose SPIRV + look at security constraints. Yes three.js will have to have a compiler to assemble glsl shaders then translate to SPIRV, not sure how it will work, but we should standardize a intermediate level language and maybe a high-level language too in the browser (?) - ZM: Past couple years a third of our effort was working around compiler bugs. Think intermediate format would help with this. - DJ: is the benefit because we think there will be fewer compiler bugs? - ZM: if we have a high level language we should have a standard implementation that all browsers adopt. - BJ: - From the PoV of WebGL developers, having a textual representation easily consumable by the browser is super important. Has enabled WebGL to have its reach as people have been able to open dev tools and see shader code being run. So having a blessed high-level language is good as most shader code online would be in this language (the one for public consumption like three.js, shadertoy etc.). However don’t care about the exact high-level language. GLSL is preexisting and has benefits. Mechanism to ingest it in the browser doesn’t matter as long as it is consistent. If people have to bundle a bunch of WASM in Web page, it isn’t as good. Interpreter in the browser? - Browser dev hat: no comment on the language itself. Have doubts about amount of work people can put in making a language. We have access to Khronos through contacts and W3C doesn’t have expertise in graphics. Just defining the API is a big task. Saying we want to invent even more things make the task even bigger. Concern about adding even more delay to shipping WebGPU. Is that acceptable? - SW: lack of high-level languages in which to write shaders is not a problem. Hesitant to endorse something that will segment the web further from desktop and mobile graphics development. Also, parsers are nasty complicated things where lots of bugs turn up. The HLSL folks have dealt with it on their side, so have the GLSL folks, don’t want to create a whole bunch more parser bugs. - - DJ: mentioned low level restrictions for some operations. That wouldn’t be encountered by SPIRV program? - DN: Vulkan only permits 32bit or bigger load and stores. 16-bit loads/stores are an extension. No 8-bit loads/stores. - MM: WSL must compile to that. So it will. - MM: Is there anything that you can’t do that isn’t listed in the SPIRV spec. - DN: You want to look at SPIRV spec plus appendix A of Vulkan spec. - CW: Appendix A of the Vulkan spec. - DJ: not important to this group or tech: the current environment (Vulkan/Spir-V) requires logical addressing mode. There’s a variable pointer extension. That use case is more from the OpenCL community, right? Will Vulkan change that environment to remove the restriction for logical addressing mode? - DN: that’s a forward looking statement. - CW: doesn’t really matter, since we have to run on shipping hardware. - MM: but in 20 years? - DN: Vulkan was made in an environment with no new hardware features except that which run current OpenGL. And SPIR-V was the way of specifying shaders in this environment, so it’ll evolve. - CB: Want to point out I agree with Ken, and mess with him :P SPIR-V is the de facto low level spec, HLSL the de facto high level spec. Want some amount of standardization advancing of both. - CB: DXIL is an open-source github project. If there are things to change in the language which could be made to make it more web friendly we are happy to talk. - DN: concerns about HLSL: lack of spec, and based on behavior of previous reference implementation. Know CB is going to address this. Hope the situation is improved. We need that as well in the web context. - CB: you have the source. In some extent that’s less ambiguous than anything written in the English language. - DN: also get unintended behavior. Some things done in HLSL shaders in the wild until you compile to a low-level representation and do a bunch of optimizations. Kind of a moving target. My team’s hitting that as well as others. We’re all agreed that this needs to be improved. Reference implementations have a lot of good properties but they also have bugs. - DJ: five companies. One with no preference. Google/Mozilla are saying “accept SPIR-V after security analysis”, with some slight web developer hat saying “source code is preferable”. MSFT/Apple say we want a human-readable text format; difference is that Apple is coming with a different proposal than MSFT. - DM: think there’s still space for high level language innovation like Rust did, like enforcing aliasing rules at compile time. Would be happy to do this as extensions later. DOM Interactions - CW: how do we: - Put stuff on the canvas - Interactions with WebVR - Let’s not do workers. Dependent on what WebAssembly does (i.e., let’s not do multithreading) - How to upload DOM elements - MM: why is this different from WebGL? - CW: one complaint: WebGL can only render to 1 canvas. If you wanted to render to two, have to go through contortions - DJ: TL;DR: there are ways to do this in the web platform already. but since we can present the render buffer in multiple places we can build a better solution. - DJ: canvas.getContext(“”) works with one canvas. So we could make WebGPU work with >1, or 0, canvases. - MM: think that’s a hard requirement to get one of these things without a canvas. - KR: Some interactions with the Javascript interaction model, not different from WebGL so we can defer that. People complain that WebGL is its own thing outside of the rest of the DOM. Want to upload arbitrary DOM elements. - JG: let’s focus on uploading same-origin DOM media elements. - MM: so, for now, no arbitrary DOM elements, and let’s see what WebGL does. - CW: let’s start with Canvas and go right into WebVR. - DJ: one way to do this: make an instance of a WebGPU device. With getContext you pass in that device. Then you’re not really talking to the CanvasRenderingContext but something else. - JG: reminds me of ImageBitmapRenderingContext. SwapChainRenderingContext? It’s a destination, but not the only way to get a WebGPU context. - Discussion about this - CW: if you allow putting any texture into a canvas. Unclear what the browser does to put textures on the screen. Need to declare how you’re gonna use the texture. Could get complicated if the canvas is in its own layer or not, etc. Maybe ask canvas “give me a texture to render into”? - KN: with WebGL we render *into* the IOSurface. - DJ: keep the great ergonomics WebGL gave you so you don’t have a lot of setup. Don’t need to allocate the depth buffer, etc. - BJ: agree, one main advantage of WebGL is “getContext” and start drawing. Not requesting tons of pixel formats, etc. Same for media elements; don’t need to allocate your own JPEG decoder, etc. - JG: those have value. Like the way where you look at the physical devices / adapters, and see which one you want to use. Forces the developer to make some choice, but the worst thing about creating a new context is “ChoosePixelFormat”. - CW: in all 3 APIs you don’t choose a pixel format. Maybe the canvas tells you “here’s the format; deal with it”. - DJ: create a WebGPU Device. Then Canvas.getContext(). Gives you back a CanvasRenderingContext. That’s the thing that gives you the SwapChain, attach it to the device, and go. - JG: instead of a WebGPU context; have a SwapChainRenderingContext. - KN/JG: more discussion about this - MM: device is not actually a device in what Dean said. (Doesn’t refer to a particular adapter in the system.) Somehow you’ll need to get the root object for the API. - Should be able to get that root object with no parameters. - KN: new WebGPUDevice(). - JG: sure. - MM: agree that there’s some constructor that takes no arguments. Other constraints too, but not for today. - DM: would need to pass in the queue created even earlier. - BJ: thinking through some feedback: question about SwapChains. Why can’t you have a SwapChain be creating ImageBitmaps? - KN: don’t want to incur copy from ImageBitmap to screen. Want to render into top-level thing given to DirectComposition, CoreAnimation, etc. - WebVR - DJ: does this mesh with how WebVR works? - BJ: WebVR does not explicitly require WebGLRenderingContext. In upcoming API, you create different layer types. There’s a WebGLLayer. Create it by passing WebGLRenderingContext. You attach this to the session and say “start presenting”. Would pass in WebGPU context (or, correction, SwapChain). - Intent: with WebGLLayer, you ask it for a framebuffer to render into every frame, so it’s effectively a SwapChain. Lets the underlying native API provide the surface you render into. - Either that layer should act as a SwapChain, or point to a SwapChain and provide the “next” surface to render into. - Need to make sure that SwapChain would potentially be populated by surfaces coming from the native VR APIs. - MM: want VR to be a supported use case. - (All agree.) - BJ: WebVR’s designed in a way so that you’re expected to have completed your rendering by the end of the callback that gave you the pose. Given nature of WebGPU API where there’s a lot of asynchrony, unlike WebGL, it’ll make things more difficult for developers. - But if they can maintain a double-buffer of resources and prep everything before your next callback, you can get everything done. - DJ: some of this will be educating developers. - BJ: there are patterns from WebGL that wouldn’t work. - DJ: we could have an explicit “PresentSwapChain” API - DJ: could ensure in our API that nothing’s going to block and take a long time. Developer has to be aware things will be asynchronous. Will have to set things up in advance. - BJ: think we won’t have an explicit “Submit” or “Present” API. Asked web platform leads, was shot down. - BJ: we also have an explicit “requestFrame”. Can do all the prep, wait for fences/barriers, then call requestFrame. - BJ: requestFrame syncs with the headset’s sync loop. 90 Hz instead of 60 Hz. - BJ: feel pretty comfortable it will work, will require a mindset change. - KN: how are we going to upload the pose? Need a synchronous upload of the pose data. - BJ: array of view matrices + array of projection matrices. Usually 1 or 2 of each. Maybe more for lightfield displays. - BJ: if I can take 64 floats and make them available inline before the draw call that would be sufficient. - CW: there will be a way to do uploads. But for sure there’s a way to update a uniform buffer with data. Don’t worry. We don’t know the exact mechanism yet, but it will exist. - BJ: good. It’s a hard requirement that we can communicate the pose synchronously with respect to the current frame. - MM: so we need it to be communicated to the draw call be done. - MM: doesn’t need to flush. No round-trip. - CW: without blocking there’s a way to provide data to the GPU. - KN: staging buffer or similar. - CW: WebVR ideally gives you a texture array and you render to one layer and then the other. - BJ: yes, ideally. If support’s there consistently then if you use WebGPU you *always* render to a texture array. - CW: all APIs do support texture arrays, so we can require that be the mechanism. - BJ: won’t affect many people. Will make rendering more efficiently. Don’t have to have connections to the current limitations of WebGL interacting with WebVR. - KN: is it possible that swapchain of native system will be designed for side-by-side rendering? - BJ: that’s the best way to interface with Daydream right now. But by the time WebGPU comes out it’ll probably have been moved forward. Also we can probably do a blit at the very end of the pipeline. And if that puts it at a disadvantage then we should fix Daydream. - Upload from dom media elements - KR: Let’s learn from our mistakes. In WebGL turns out there are some sync operations that happen in some cases. For HTMLImageElement synchronous decode needs to happen. For HTMLVideoElement HW and SW path conflated but it prevents some 0copy. HTMLCanvasElement needs GPU to GPU copies. ImageBitmap from HTMLImageElement give you the data ready for consumption by the GPU - KR: Suggest we force uploading from HTMLImageElement and require ImageBitmap instead. - DJ: Could use the decode() function on HTMLImageElement. - KR: does that take extra arguments like flipY unmultiplyAlpha etc. Don’t think image element has it. - KR: For WebGPU there is not state for “pixel state”. Not sure about flipping Y. Will need to deal with this stuff in WebGPU and figure out how things will interact. Suggest ImageBitmap is the only way to upload images. For video elements suggest we do something like the “live update” mechanism LG is working on for WebGL. = - DJ: What if I want to keep the frame while the video is playing? - KN: LG’s thing is 0copy. You can make a copy if you want a fixed image? - KR: Need to support HW and SW paths. HW like a texture source. SW give data, copy in a buffer then upload to texture? Basically we need to try to get the video decode path with as little copies as possible. - DM: How important is video? - All: very important. - KR: HTML canvas, maybe do an image bitmap from it? - DJ: Like only 2 entry points: image bitmaps and video. How do Image bitmap work with compressed textures? - RC: You can’t make one from an image or a video, you need to do an upload. - DJ: So we need a way to upload from ArrayBuffer? - DJ: thinking more of the WebGL case, where the only way you can upload a DOM element is via TexImage2D. That’s why you need the ArrayBuffer entry point. But in WebGPU you’re going to upload to buffers that aren’t necessarily images. - RC: asking can we upload an image to a vertex buffer? - CW: upload to buffer or to compressed format? - DJ/MM: want to upload raw compressed bytes (ETC, DXT, etc.) - CW: you’d need a query mechanism to know the supported compressed texture formats. - Some confusion about how WebGL handles compressed textures. - MM: so, two entry points for uploading to textures from DOM. - One accepts ImageBitmap. - The other accepts HTMLVideoElement. - KR: we need to separately consider the software and hardware cases for HTMLVideoElement. - CW/JG: if you have raw data, you MapBuffer/copy data into buffer/UnmapBuffer. - DJ: it’s a bit more code. creating ImageBitmap returns a Promise. - MM/DJ: the “one line” in current WebGL samples leaves synchronous blocking. - DJ: you don’t want a wait in your WebVR rendering callback.
Received on Tuesday, 26 September 2017 13:55:29 UTC