W3C home > Mailing lists > Public > public-gpu@w3.org > September 2017

Minutes from the 2017-09-22 meeting

From: Corentin Wallez <cwallez@google.com>
Date: Tue, 26 Sep 2017 09:54:30 -0400
Message-ID: <CAGdfWNOxJ6NEg5Amc1GFwjTwr_zDX6bR-fy-LQPbHXrNX=4=-g@mail.gmail.com>
To: public-gpu <public-gpu@w3.org>
GPU Web 2017-09-22 Chicago F2F

Minutes from last meeting
<https://docs.google.com/document/d/1seCUVBkzkRPEj0sfcDBymGjwSPndTGPhsQJPkUQscNY>
TL;DR of the TL;DR

   -

   Better fleshed out target for the MVP, some decisions are still pending
   investigation
   -

   More discussions of implicit vs. simplified vs. explicit memory
   barriers. Action items to make investigations on example use-cases.
   -

   Metal is able to automatically run some encoders asynchronously.
   Discussion on doing this vs. having application explicitly handle queues.
   -

   Fil + Myles showed their prototype language WSL that encodes the
   constraints of SPIRV logical addressing mode.
   -

      Widespread agreement that it is an improvement over current languages.
      -

      Discussion on which language the API consumes (SPIRV vs. WSL)
      -

      Agreement that we should have a blessed language, but which one?
      -

   WebGPU devices will be created from thin air and there is something like
   “WebGPUSwapchainCanvasRenderingContext”. Also how WebVR would work inside
   the frame callback.
   -

   DOM elements uploads are from image bitmaps or video source, that’s it.

TL;DR

   -

   Mozilla has a WebGPU prototype <https://github.com/kvark/webgpu-servo>
   running on D3D12 and Vulkan
   -

   MVP Features discussion
   -

      MVP should contain structural elements of the API and prototype
      ergonomics.
      -

      MVP doesn’t have to run on all target hardware of v1
      -

      Open question: shading language for the MVP? WASM + JS API or just JS?
      -

      Out: tessellation, transform feedback, predicated rendering, sparse
      resources, pipeline caching, aliasing
      -

      Tentatively out: DrawIndirect / DispatchIndirect (need
      investigation), “glGenerateMipmaps”, queries
      -

      Tentatively in: multisampling
      -

      In: compute, fragment, vertex, render passes, MRT, upload download,
      copies, instancing, binding model, command buffers, pipelines, a dummy
      extension
      -

         If we decide on multiple queues: async compute, queue
         synchronization
         -

         If we decide on explicit memory barriers, they are in
         -

   Memory barriers
   -

      Initial opinions
      -

         Apple: simplified API with implementation doing more work. Vulkan
         version would do more work but a design can make speed good enough
         -

         Google: declarative and failsafe API with explicit transitions,
         should map nicely to all APIs, ensure apps don’t fail to
render on Android
         -

         Microsoft: D3D team’s experience is that barriers are required to
         make bindless work because driver doesn’t know which resources will be
         accessed.
         -

            Discussion that Vulkan and Metal aren’t really bindless.
            -

         Mozilla: would like to give developer full control and power of
         underlying APIs
         -

      Metal has “explicit barriers” at encoder boundary. Can run some
      encoders async.
      -

      Discussion on whether to allow for barriers inside a subpass and what
      the usecase for it is in Vulkan.
      -

      Discussions about Vulkan render passes, how it requires memory
      dependencies between subpasses to be specified or it would cause pipeline
      recompiles.
      -

      Agreement that we can’t validate shaders with data races (via UAVs).
      -

      Industry building task graph abstractions. Metal provides it as a
      linear sequence optimized by the driver. Vulkan provides subgraphs at a
      time with render passes.
      -

         Frostbite’s FrameGraph:
         https://www.ea.com/frostbite/news/framegraph-extensible-rendering-architecture-in-frostbite
         -

      Need to gather use cases requiring memory barriers and see how they’d
      be implemented in each API.
      -

   Multiple queues
   -

      Metal can push different encoders to different hardware queues
      automatically. That the app can create multiple MTLQueues is just because
      it didn’t make sense in ObjC to limit creation of only one MTLQueue.
      -

      Discussion about automatic async compute on D3D12 / Vulkan.
      -

         D3D exposed multiple queues to stop analyzing the command stream
         in the driver. Maybe not needed for WebGPU.
         -

         Explicit queues make app point out parallelizable commands.
         -

         Order of submits in a Vulkan app is a good order for encoders in
         Metal. Will need validation of the correctness of the order
in all cases.
         -

   Shading languages
   -

      Fil presented his and Myles’s work on making a language with
      constraints of SPIRV logical addressing mode built in. Called WSL here.
      -

         Familiar C syntax, generics, operator overloading used to
         implement vector types for example.
         -

         Shader terminates early in case of error.
         -

         Special pointer types encoding SPIRV constraints
         -

            T^ cannot be cast, cannot be assigned after declaration
            (content can though), and function returning them can have
only one return.
            -

            Some slice types for arrays, went into less details for them
            than for T^
            -

         Goal is to have bisimulation with SPIRV to show they are
         equivalent.
         -

      Follow-up discussion:
      -

         Agreement that language improves greatly on GLSL / HLSL
         -

         Concern about creating a new language, and asking people to move
         over
         -

         Concern about generic instantiation bloat when people want to keep
         one code path.
         -

         Apple suggest API consumes WSL directly.
         -

         Concern WSL doesn’t reduce the number of checks or speed of
         validation.
         -

         Discussion of advantages of WSL over SPIRV and rebuttals:
         -

            View source vs. shipping shaderc WASMed only where needed
            -

            Security built in vs. logical addressing mode is just that
            -

            Flexibility for WebGPU vs. SPIRV execution environment
            -

            All APIs require same amount of translation vs. ???
            -

         Request for name change: WSL is Windows Subsystem for Linux
         -

         Myles did a demo of WSL
         <https://cdn.rawgit.com/webkit/webkit/master/Tools/WebGPUShadingLanguageRI/index.html>
         -

         AIs on showing equivalence of SPIRV and WSL, and showing
         validation of SPIRV logical and buffer accesses instrumentation.
         -

         Need to talk to Khronos to see what implications of using SPIRV
         are.
         -

         Concern that WSL was quick to dismiss prior art and battle tested
         toolchains.
         -

         Roundtable
         -

            Apple: Think WSL meets requirements from the group. Think SPIRV
            could be ok provided there are more investigations.
            -

            Google: WSL is a great investigation, need better defined
            requirements for shading languages, SPIRV gets us far
quickly and our
            intuition is that it is the right choice. WebGL experience
is that native
            parsers are huge source of bugs.
            -

            Microsoft: View source is important, suggest HLSL is the
            language of choice as there has been a focus on
standardizing it. It has
            the largest amount of content. Think at a low-level it
would be better to
            accept SPIRV than DXIL.
            -

            Mozilla: We should push the platform, making a new language
            would slow us down.
            -

            Developer PoV: a textual representation is important for
            education etc. so blessed high level language is important.
            -

         Suggestion to use HLSL as de-facto high-level language and SPIRV
         as intermediate level. People would want a better spec for HLSL though.
         -

   DOM interactions
   -

      Agreement that a WebGPU device (root object) is created from outside
      of a canvas.
      -

      Consensus there is a WebGPU device constructor with no arguments
      -

      Agreement that there is a canvas rendering context that gives you a
      “WebGPU swapchain” that hands out texture for rendering.
      -

      Consensus that WebVR is a supported use case. Will need a way to
      update buffers synchronously without blocking inside the WebVR frame
      callback.
      -

      In the WebVR frame callback the application will ask the WebVR
      swapchain for the next textures.
      -

      WebGPU might require rendering in a texture array and not
      side-by-side like is currently allowed in WebGL.
      -

      Only one entry-point to upload a 2D DOM element; it takes an
      ImageBitmap.
      -

      Another entry-point to create a texture “video source” from a video
      element.

Tentative Agenda

   -

   Morning (9AM - 1PM):
   -

      Status updates
      -

      MVP features
      -

      Memory barriers
      -

      Multiple queues
      -

   1PM - 2PM: Lunch
   -

   Afternoon (2PM - 6:30PM):
   -

      Shading languages
      -

      DOM interactions
      -

      Others (for extra time)?
      -

         Swapchain/presentation
         -

         Re: descriptor heaps
         -

         Re: index format in pipeline state?

Attendance

   -

   Apple
   -

      Dean Jackson
      -

      Filip Pizlo (by phone for shading language)
      -

      Myles Maxfield
      -

   Google
   -

      Brandon Jones
      -

      Corentin Wallez
      -

      Kai Ninomiya
      -

      Ken Russell
      -

      Shannon Woods
      -

      Zhenyao Mo
      -

   Intel
   -

      Bryan Bernhart
      -

      Yunchao He
      -

   Microsoft
   -

      Chas Boyd (by phone)
      -

      Rafael Cintron
      -

   Mozilla
   -

      Dzmitry Malyshau
      -

      Jeff Gilbert

Administrative stuff

   -

   DJ: Lawyers for Apple / Google / Microsoft trying to figure out a
   software license to contribute code to the group. Kind of a new thing for
   W3C. Mostly on agreement and working on final wording. Will likely look
   like Apache license.
   -

   JG: Mozilla would like a copy of the license.
   -

   DJ: the companies want to use this for other projects as well, like LLVM
   (and maybe ANGLE).
   -

   DJ: W3C has an all-groups meeting called TPAC. We don’t have a slot to
   meet there (Bay Area this time), but could get a couple of hours to meet if
   we like. Don’t think it’s worth it. Could have a session inside the
   WebAssembly group where we talk about what we want for an API to call from
   WebAssembly. (Currently WASM can only call JavaScript.) Graphics will
   probably be the first external thing called from WebAssembly.
   -

   CW: depending on DOM interactions discussed this afternoon there might
   be more stuff too, like mapping buffers inside the WASM memory space.
   -

   DJ: will coordinate with chairs.
   -

   CW: also signed up for making a demo for W3C attendees. Will re-show
   demo from Vancouver F2F showing compute + graphics together. Attendees will
   be folks who are not GPU experts.

Status updates

   -

   Apple:
   -

      Haven’t done anything recently to existing impl in WebKit
      -

      Would like to move it closer to what we’ve already decided in the
      group, and make it clear that it being “WebGPU” isn’t the
decision by this
      WG
      -

      Myles and Filip have been doing an experiment to design a secure
      shading language. Partial implementation in JavaScript done.
      -

   Google
   -

      Implementing index format in pipeline state that we talked about last
      meeting
      -

         Works
         -

         Writing tests, verifying primitive restart on all backends
         -

   Intel
   -

      No update
      -

   Microsoft
   -

      Haven’t been writing any code
      -

      Have been talking about shading languages and memory barriers
      -

   Mozilla
   -

      Made big progress on D3D12 backend
      -

      Working on GL backend
      -

      Figuring out rough spots of descriptor heaps, resource heaps and
      pipeline barriers
      -

      Think desc + resource heaps can look a lot like Vulkan and be
      efficient on D3D12 and Metal
      -

      WebGPU prototype <https://github.com/kvark/webgpu-servo> running on
      D3D12 and Vulkan!

MVP Features

   -

   CW: There are things on the mailing list which we’ve ruled out of the MVP
   -

      Want it to be enticing, but also not hard to get right
      -

   DJ: to be clear, this wouldn’t be version 1.0, and could still make
   breaking changes (hesitantly)
   -

      Enough to convince ourselves and the community that it’s the right
      direction
      -

   CW: also get ideas in concrete form to see what works/doesn’t
   -

      Want most things in there that will cause structural issues / changes
      -

   DJ: it’s important because we don’t have many facts or much experience
   writing code or content with the ideas we’ve come up with
   -

      Metal 1.0 : the way Apple went about it was to start with a small
      feature set and add it over time, as we got feedback from developers and
      hardware changed
      -

   DM: less interested in getting developers interested, but rather focus
   on things that will affect the architecture
   -

   DJ: more interested in ergonomics and development. If we’re writing
   content for WebGPU, if we think it’s too difficult / easy, we can adjust
   -

   CB: how do we define the feature set? List of things that have to be
   operational for someone to be interested? What are the features of the
   common subset of the APIs we’re targeting for initial version?
   -

      Expected HW configs
      -

      Whether we’re trying to support “Big Compute”
      -

   DJ: Ben Constable suggested trying to get to the point where we can draw
   a triangle on the screen
   -

   CW: we do have different prototypes which do this. But as a group we
   don’t have enough consensus to develop an API which can draw a triangle.
   -

      If you just want to render a textured triangle there’s a lot of
      structural stuff you can ignore. Like how you get back data from the GPU.
      But this can affect a bunch of parts of the API.
      -

      So let’s focus on structural stuff as well as stuff that’s cool for
      demos.
      -

      Don’t need all the pipeline state like all the blend functions.
      -

      But if you don’t specify how to give textures to the shader you have
      a problem.
      -

   MM: makes a lot of sense. Rather than working toward one program, let’s
   work toward a set of programs.
   -

   CW: maybe let’s decide what’s not going to be there? Small list in the
   email to GPUWeb:
   -

      Sparse resources
      -

      Transform feedback
      -

      Tessellation
      -

   What’s left: compute and graphics workloads
   -

   MM: do we need a blit encoder at the beginning?
   -

   CB: copy engine
   -

   CW: probably need that, if only from upload buffers to textures
   -

   DJ: don’t include:
   -

      bundles / secondary command buffers
      -

      stream-out / transform feedback
      -

      predicated rendering
      -

      tessellation
      -

      sparse resources
      -

      Roadmap is at https://github.com/gpuweb/gpuweb/wiki/Roadmap
      -

   CW: workloads:
   -

      Rendering
      -

      Vertex/Fragment shaders
      -

      Multiple attachments (multiple render targets) (?)
      -

      Render passes
      -

   CB: definition of MVP is that it’s viable in the marketplace
   -

   CB: think people will want G-Buffers for deferred shading
   -

   CB: around compute: do we have to support asynchronous tasks?
   -

   CW: depending on result of today’s later conversation, may need concepts
   of memory barriers and multiple queues in the MVP. Whatever the decision is
   (include or don’t), those will or will not be in the MVP
   -

   CB: think we should have async compute. Barriers are a separate process
   we can determine later.
   -

   CW: they’re tied into memory synchronization, which ties into queue sync.
   -

   JG: the idea of fences is different than memory barriers
   -

   CW: the idea of both, and whether they’re implicit/explicit, is going to
   be predicated on the result of this discussion
   -

   JG: grouping these disparate topics into “synchronization” is too big a
   chunk
   -

   CW: upload and download
   -

   JG: memory model
   -

   RC: instancing (group: yes especially since it should be easy)
   -

   KR: we don’t have to have *everything* from WebGL 2.0. Even with
   instancing, there are a bunch of variants (base vertex, etc.)
   -

   MM: what about GPU-driven rendering (DrawIndirect?)
   -

      The three APIs handle this slightly differently
      -

      Could be hard / change things structural
      -

      CW / JG: should investigate and see how hard it is
      -

         Probably don’t need it for the MVP, but if it might affect the
         overall API structure, should consider including
         -

      MM: have investigated it a bit but not enough to talk about it
      -

   CW: binding model, pipelines, command buffers (goes without saying)
   -

   DM: resource heaps and how they work
   -

   CW: we had an NXT roadmap where we went through all of these items
   -

   CW: copies / blits
   -

   DJ: mipmapping?
   -

      Unclear; Vulkan doesn’t have it. Do it yourself
      -

      DJ: Metal does have this in the copy encoder
      -

   CW: pipeline buffer update?
   -

      In the command buffer, say “update buffer with this data”. Inline
      buffer updates where the data is an immediate in the command buffer
      -

      DM: it is convenient and there’s a way to do it in all the APIs
      -

      MM: is this for performance?
      -

      CW: nothing you can’t do with a staging buffer, and Metal doesn’t
      have it
      -

      MM: let’s leave it out then
      -

      MM: don’t need two ways to do this. Can add it later
      -

   CW: multisampling?
   -

      JG: yes. We have to handle resolve properly. Don’t trust absence of it
      -

      KR: could be a can of worms. We just gave developers multisampled
      renderbuffers and now they want EXT_multisampled_render_to_texture.
      -

      JG: isn’t this transitions too?
      -

      CB: could we keep the resolve an opaque operation at this level of
      the API? A high-level call on the resource?
      -

      MM: another way to do it would be to attach another texture to your
      framebuffer and have it auto-resolve
      -

      JG: it’s presented more flexibly in Vulkan at least
      -

      CB: and in D3D too
      -

      MM: probably need at least facilities for it
      -

      JG: there are two levels.
      -

      MM: should figure out which level for the MVP
      -

      CW: think it should be part of the MVP; we need to figure out this
      story.
      -

      MM: who doesn’t want this to be in the MVP?
      -

      KR: could be complicated
      -

      CB: once we define resources and copying, it’ll be easier to
      understand how it works
      -

      DM: think that adding multisamples should be easy
      -

      CW: it’s different in Vulkan and is done on renderpasses if you want
      to be friendly to tilers.
      -

      JG: don’t want to do it magically
      -

      RC: is auto-resolve magical?
      -

         JG: yes
         -

      JG: we should talk about it for the MVP. If implementing is onerous
      we can re-discuss it
      -

      CW: let’s say it’s tentatively in the MVP, pending analysis
      -

   CW: memory barriers?
   -

      JG: figuring it out should be in the MVP
      -

   CW: queries?
   -

      timestamp, occlusion
      -

      DM: do we have an investigation of them?
      -

      CW: not yet. Metal has very few types of queries. Have occlusion
      queries, but are a totally different concept.
      -

      CW: should investigate and
      -

      KR: Can we just say they aren’t in the MVP? In WebGL queries are 1
      frame behind and people don’t like them and don’t use them.
      -

      DM: Can emulate them with pixel shaders and UAVs (and readback)
      -

      KN: they’re a little weird in Metal but shouldn’t be a structural
      change. Should be a separate part of the API.
      -

      CW: tentatively out.
      -

   JG: shading languages and how you feed them (e.g. vertex attribute
   marshaling)
   -

      CW: pipeline state etc. we all agree should be in the MVP
      -

   JG: we’ve already punted on pipeline caching
   -

   JG: resource aliasing?
   -

      MM: what was the result on heaps?
      -

      CW: pending investigation
      -

      RC: would say no on aliasing for MVP
      -

      CW: that’s my gut feeling too
      -

      MM: two ways to use this word. One buffer -> two points in the
      shader. Or, a texture and buffer pointing at same memory.
      -

      CW / JG: we’re talking about the second one.


   -

   Meta stuff about the MVP?
   -

      CW: Should promise to break it and not enable it by default.
      -

      DJ: Helps with security. In Safari TP you can enable / disable
      features at runtime.
      -

      DJ: Hardware we are targeting is essentially anything which runs Metal
      -

         For Google most of Android devices which ship Vulkan
         -

         For Apple, any Metal 1.0 device (nearly all iPhones ATM)
         -

            Some smaller subset of Mac hardware excluded that doesn’t have
            Metal.
            -

            https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf
            -

         For Microsoft: all D3D12 devices
         -

            If you have the 12 API, it can be used, so we should target it
            -

            CB: lowest-end DX12 capable system will be as tough to target
            as the lowest-end Android phone. So not clear that D3D12
will be defining
            the floor


   -

      CW: suggest that for MVP, not guaranteeing that it’ll run on all of
      these systems
      -

      DJ: on Android Vulkan’s been supported for two releases, but first
      release was a bit shaky.
      -

      CW: Android still only requires GLES 2.0. However, of the Android
      devices which ship Vulkan, we’d like to support most of them.
      -

      MM: does every device which has Vulkan also have to pass the Vulkan
      conformance tests?
      -

      CW: have to pass the CTS except for some disabled tests. But some of
      these devices shipped on CTS versions that were incomplete, so there are
      bugs. Some have iffy Vulkan devices. We’d have to do workarounds.
      -

      JG: we’ll try to run on Vulkan machines, but we won’t hold back MVP
      (or change the API) for broken devices.
      -

      CW: Vulkan does have a lot of optional features, including logic ops
      (!). Might need to ask to remove certain features from the MVP and recast
      as extensions. But extension story should be a post-MVP thing.
      -

      KR: point out that Vulkan’s extension support was structural -- the
      optional void* extension pointer at the end of each struct
      -

      JG: agree, should have one no-op extension to understand how they’ll
      work
      -

      CW: MVP’s content is important. Do we care about the API shape?
      -

      JG: yes
      -

      CW: ok, then a dummy extension should be there
      -

      KN: not clear how easy it will be to agree on an API shape
      -

   MM: shading language for MVP should probably be shading language we
   progress forward with
   -

      Should also decide whether we’re going to make a JavaScript API,
      WebAssembly API, both, or neither. :D
      -

   DJ: definitely have to have a JavaScript API. Question is whether we
   have a C one or not. No way to call C from WebAssembly.
   -

      MM: we should have a discussion and there’s one right answer.
      -

   MM: have been keeping the roadmap document up-to-date to the best of my
   ability

Memory barriers

   -

   CW: How do we get to a resolution
   -

   BJ: Trial by combat
   -

   DJ: Is there any way we can split it up into a smaller discussion?
   -

      We’ve already discussed the philosophy previously
      -

      Apple’s perspective: we think a much simplified API with the
      implementation doing more work is the better solution
      -

      The other solution says that it’s good to give the developer all this
      control
      -

      What this would mean to a Metal implementation: a bunch of stuff
      would be no-ops because it’s handled by the implementation. And
the no-ops
      wouldn’t slow things down.
      -

      Thesis for a Vulkan implementation: a Vulkan implementation would
      have to do more work, because the Metal driver’s doing it; but think that
      we can agree upon an optimal design that will give “good enough”
      performance.
      -

   DM: clarification: we want both source and destination of transitions to
   be specified. That’s the way Vulkan does it.
   -

   CW: we think memory barriers need to be explicit for many reasons, so we
   should expose them to the app developer. But they should be declarative and
   failsafe.
   -

      I have a resource, treat it as an assembled image, or as a vertex
      buffer. (This allows barriers to be grouped.)
      -

         In other words, specify the destination state.
         -

         It’s a D3D12 transition barrier with only the destination stated.
         -

         DM: ah, so it’s a bit higher level.
         -

      Impl should do whatever is needed to make that happen.
      -

      Avoids developer needing to understand what the memory model means.
      -

      If the developer does it wrong, they’ll get a validation error.
      -

      It’s a simplified model, and will map well to all backends. Easier to
      validate, easier to learn.
      -

      If we validate strongly memory barriers, then WebGPU will work
      seamlessly across desktop and mobile.
      -

      If we *don’t* do this, developers will make things that work on
      desktop but *not* on mobile.
      -

   RC: spoke with members of the D3D team.
   -

      D3D11: very explicit. Bind everything ahead of time.
      -

      API had all the information it needed to do the barriers for you.
      -

      Shaders: indices into arrays had to be constant.
      -

      In the new bindless world, you can’t know that. In the shader, your
      array indices are not constant. You can calculate the index,
index into the
      table, read from here, write there. No way with that model for
the runtime
      to figure out what you’re reading from and writing to.
      -

      Had to add memory barriers.
      -

      For this reason, we think that version 1 of the API should have
      memory barriers, to set ourselves up for the new bindless future.
      -

      If we have dynamic indices in the shader, don’t think we can figure
      out what’s read and what’s write.
      -

      CW: what does this mean in practice?
      -

      RC: this means we need to make barriers explicit.
      -

      CW: are the barriers validated for correctness?
      -

      RC: think it will be very difficult to do. D3D lead said, if you can
      figure out a way to validate them, then the API should auto-do it for you.
      -

      JG: there are different degrees of validation. Can ensure something’s
      safe without ensuring it’ll be completely portable.
      -

      RC: if they’re auto added for you, is it twice the cost? 3x?
      -

      CB: barrier model we have in DX12 is slightly higher level than the
      one that’s in Vulkan. But lower level than what’s in Metal. Trying to
      understand how this ties into goals of resource binding model.
      -

         DX11: max 128 textures bound to single pipeline, statically
         indexed at compile time.
         -

         New APIs: can compute that index inside the shader, and it can go
         up to ~1 million textures.
         -

         CW: don’t think that’s the case in Vulkan. Vulkan still caters to
         fixed-function hardware. Bindless isn’t mandatory. Easy to
change part of
         the bindings. Don’t think you can access millions of
descriptors like D3D.
         -

         CW: Metal is D3D11 style. Has bindings, not bindless. Has dynamic
         indexing, but it’s a texture table.
         -

         RC: the D3D team’s conclusion was based on having bindless. If we
         don’t have this then maybe it is possible to auto-insert barriers.
         -

         With advent of multiple queues the runtime can’t insert them for
         you.
         -

      CB: example: CopyEncoder, then texturing from it. Have to signal that
      the copy is done.
      -

      DJ: Do you mean only for async copy / compute case?
      -

         CB: yes, if it’s in a separate queue then it’s asynchronous.
         -

         DJ: thought we’d agreed there’s only one queue? (CW: no)
         -

         DJ: can be potentially asynchronous in Metal as well. Metal has
         explicit barriers. It’s about the encoder, and inside compute.
         -

         CW: probably inside blit as well. Just a bit simpler.
         -

         MM: not sure about that.
         -

      RC: so for compute, do you believe in explicit barriers?
      -

      DJ: yes, we believe in explicit barriers for some cases.
      -

      CB: within a particular encoding stream.
      -

      MM: render a triangle, then a second triangle, in the same encoder.
      The second triangle has to appear on top of the first.
      -

      JG: other commands and dependencies can happen at different times. (?)
      -

         In Vulkan you can have things run in parallel
         -

      MM: write to texture in fragment shader, then read from that texture,
      it’s not defined. Would need to end the encoder.
      -

      CW: in Metal barriers are inserted between the encoders. A Vulkan
      subpass corresponds to one Metal RenderEncoder.
      -

      DM: it’s not 1:1
      -

      CW: if you need to do a barrier between Vulkan subpasses, then you’d
      split the Metal encoder. You don’t have barriers inside subpasses.
      -

      DJ: that’s what I meant about explicit barriers in Metal.
      -

      MM: what we’re really talking about isn’t whether there should be
      barriers, but what should the programmer describe when they need
      synchronization.
      -

   CW: think we can agree that we don’t want any form of barrier inside
   subpasses, because that’s impossible to implement.
   -

      JG: subpass self-dependency
      -

      CW: limited what you can do in there. Vertex UAV writes -> Fragment
      UAV reads.
      -

      JG: thought this is what could help the tiler
      -

      CW: in Vulkan, can have dependencies between subpasses.
      -

      CW: in Vulkan, can only push data from vertex to fragment inside a
      subpass
      -

         Don’t know any use case for it. Would be ready to not have that.
         -

      JG: Vulkan spec section 6.6.1 about dependencies
      -

      CW: think this too hard and niche, and we shouldn’t put it in. (On a
      tiler GPU, putting barriers between vertex and fragment
processing without
      flushing the tile caches)
      -

      RC: so you need to close the pass and open a new one?
      -

      CW: yes. Because Metal and Vulkan are catering to tiled GPUs, have to
      be explicit about when rendering to a certain attachment set is
started and
      ended. If you want to read from the attachment, it’s required that you
      can’t do so from the same subpass.
      -

      CW: it’s more like UAVs where you write to it from the vertex shader
      and read from it in the fragment shader. Can’t think of a use
case for this.
      -

      CB: don’t think this works anywhere.
      -

      CW: might work on tilers?
      -

      CB: read-after-write hazard. Plenty of things people do after a
      render and they have to switch layouts. Very common hazard.
      -

      RC: so, it’s a hazard to write to a UAV in a draw call and read it in
      a different one.
      -

      CW: Vulkan subpass self-dependency.
      -

      MM: one thing we can all agree on: shouldn’t be able to write to a
      UAV from a vertex shader and read from it in the fragment shader, in the
      same draw call.
      -

      CW: to close this topic: we don’t put memory barriers inside
      subpasses. If you need this for your use case you do a different subpass.
      -

      RC: we do agree on some kinds of barriers!
      -

      JG: I am curious why this wound up in the Vulkan spec…
      -

      MM: until that question’s answered, we should proceed
      -

      RC: rendering to a texture and reading from it requires a new subpass?
      -

      CW: yes
      -

      CW: this would be implemented as OMSetRenderTarget in D3D12
      -

   MM: next topic: during the boundary between one renderpass and the next,
   what should the programmer say?
   -

      CW: in Vulkan, when you have renderpasses with multiple subpasses,
      for each attachment, you have to say how it is used when.
      -

      CW: redundantly-ish, have to say what are the memory barriers between
      subpasses.
      -

         The membars between subpasses can be a superset of transitions
         -

         Can say: want buffer writes done in that subpass to be visible in
         this subpass
         -

         In addition to transitions of textures
         -

      CW: if we support renderpasses with multiple subpasses – which I
      think we want because it’s very handy in both Metal and Vulkan
on tilers –
      then when we create the renderpass describing the rendering algorithm, we
      need to say “I want the shader writes done here to be visible over here”.
      -

      CW: otherwise on Vulkan we have to guess, and take the worst-case
      guess, leading to a pipeline recompile. So we really need them described.
      -

      CW: between subpasses, need to encode which memory barriers it might
      require.
      -

      MM: and if the app gets it wrong?
      -

      CW: we should find a way to validate that.
      -

      MM: the validation compares “expected” vs. “real”?
      -

      CW: in renderpass: app says, at this point, i want to be able to have
      resources that go from “shader writes” to “being sampled”. Each resource,
      it says it does that.
      -

      MM: if you have information about what the app’s doing: then you can
      retroactively insert barriers immediately?
      -

      CW: no. Renderpasses encode memory barrier information. Pipelines are
      compiled against renderpasses, and can only be used in compatible
      renderpasses.
      -

      DJ: can’t change the renderpass after it’s been “closed” because that
      would cause recompilation of the pipeline?
      -

      CW: yes.
      -

      MM: in order to make that work you’d need to wait until the end of
      the pass
      -

      DJ: if you were going to do it automatically: you’d need to record
      the commands, submit everything, and then …
      -

      CW: wait until everything’s done. Build renderpasses. Then recompile
      pipeline. Then encode command buffer.
      -

      JG: why do we have to  recompile pipeline?
      -

      RC: is the recompile needed on D3D12?
      -

      CW: no. would not need to recompile pipelines. It’s not as bad as on
      Vulkan.
      -

      CW: memory barriers between renderpasses change. Renderpasses are
      compatible if they are the same in everything but the initial layout of
      resources (framebuffer swizzling, etc.), and load/store operations for
      different things (in Metal)
      -

      CW: if you’re on a tiler and you have to flush the tiler, you want to
      take advantage of that for register allocation on your tiler.
      -

      JG: how would you have different initial and final image layouts?
      -

      CW: high-level point: if we have multiple subpass renderpasses,
      there’s some implicit memory barrier that’ll have to be inserted.
      -

      DM: don’t understand how this can be done automatically on Vulkan.
      user can not communicate to driver what to do.
      -

      JG: would have to infer dependency graph from what was submitted
      -

      DM: several different ways to do this. Not clear.
      -

      JG: in metal you encode things in an order. Things happen in that
      order. Things that aren’t dependent can happen in arbitrary order.
      -

      MM: in Vulkan things aren’t ordered?
      -

      KR: it’s a render graph. Some parts can run in parallel.
      -

      KN: you can insert them in whatever order
      -

      CW: it’s like you provide your rendering graph to the driver, and the
      driver optimizes / schedules it.
      -

      KR: engines are representing their frames as graphs internally
      already. Frostbite:
      https://www.ea.com/frostbite/news/framegraph-extensible-rendering-architecture-in-frostbite
      -

      CW: metal only provides one attachment at a time.
      -

      KN: in Metal you have to give things in the right order. In Vulkan
      you can submit in any order but have to provide the dependencies.
      -

      CW: Vulkan’s way of it lets you do register allocation of the tile
      cache.
      -

      JG: don’t see the distinction. All the APIs have dependency graphs.
      -

      CW: you want to understand exactly where your data ends up in the
      tile cache. Metal doesn’t have information about the pipeline when
      submitting.
      -

   CB: suggestion:
   -

      given that there’s some diversity of the use of the term “barriers”,
      might be interesting to look at the top 3 or 4 use cases, see
how they’d be
      implemented in each API, and see what abstractions would work
      -

      look at things like RAW, WAW hazards
      -

      Merging the APIs without that use case context will be a long tail
      operation
      -

      JG: concerned we might miss use cases
      -

   MM: related question: have decided there’s at least one case where the
   app is wrong. Where if you write to a UAV in vert shader and read from it
   in fragment shader, that’s undefined. What happens then? How does the
   browser know that this scenario occurred?
   -

      CW: don’t think we can validate that
      -

      MM: then we have unportable apps
      -

      CW: don’t think we can shield against concurrency bugs when we have
      read-write buffers
      -

      MM: would annotate every buffer. all buffers attached to vert / frag
      shader.
      -

      CB: in DX11, we can validate this, fail and unbind the previous bind
       to the pipeline
      -

         B/C we have indexing in the pixel shader in D3D12, can’t validate.
         Have a debug layer. Instruments the shader at runtime. Warns
the user that
         that’s an illegal operation.
         -

         MM: think this sort of analysis needs to be done for every draw
         call by the browser
         -

      CB: what we’re looking at is a model where we don’t support arbitrary
      indexing. So we can do the D3D11 validation model.
      -

      CW: app allocates a big buffer. Read-write “stuff” in vert shader in
      one part. Read-write “stuff” using frag shader in another part.
      -

      CB: APIs don’t allow this today. Problem you run into is that
      segments of that buffer have been cached with different granularities in
      different ways.
      -

      JG: swizzling patterns for tile subsections
      -

      CB: these are properties of the resource description
      -

      KR: that big a limitation to say you need two different buffers for
      this?
      -

      JG: would be different from Vulkan
      -

      CW: would be fine from our point of view; slightly limiting
      -

      MM: think we should eliminate undefined behavior
      -

      DM: you already have this with just a single UAV. fragment execution
      is unordered.
      -

      MM: what if you have only a single thread?
      -

      CW: works then.
      -

      KR: or if you use atomic ops
      -

      CW: we simply can’t verify shaders with data races.
      -

   Discussion about serial submission vs. parallel submission
   -

      KR: is it the same as topological sort used by compilers to linearize
      graphs?
      -

      DM: don’t think so. better to submit the graph. if we establish the
      order, then we limit the amount of reordering and rescheduling the driver
      can do.
      -

      JG: it’s sort of about identifying hazards.
      -

      CB: I’m a big fan of task graphs. Covers all 3 APIs. Devs used to
      graphical abstraction can author this almost with a markup language.
      -

      CW: aren’t renderpasses that task graph?
      -

      CB: yes, kind of as a tree. Or sequence of sub-graphs.
      -

      JG: it’s an incomplete graph.
      -

      CB: a lot of engine companies are looking at a task graph model. The
      top level of their engine is already a task graph model and
they’re looking
      for a more direct mapping. So a task graph in the API would not preclude
      using it in AAA content. Or Unity. (ooh, burn)
      -

      RC: ?
      -

      CB: so if we express things at a graph then we don’t need barriers
      and we can use the graph to express dependencies
      -

      RC: so all the dynamic UAV stuff would have to be inserted into the
      graph?
      -

      CB: not sure we can say that there wouldn’t need to be some kind of
      “UAV barrier” inside the shader
      -

   CW: do we want to minimize undefined behavior?
   -

      JG: we have different concepts of that
      -

      MM: we as a group shouldn’t pursue eliminating undefined behavior as
      the only goal of this group
      -

         But, it is valuable to limit undefined behavior
         -

         It’s not the only goal, or the most important goal.
         -

      CW: we should minimize undefined behavior at the API level, while
      staying at our perf target of 80-90% of native.
      -

      KR: but not, say, a factor of two hit.
      -

   Discussion about Vulkan’s requirements that:
   -

      Renderpasses: get (attachment descriptors, subpass descriptors,
      subpass dependencies)
      -

      Pipeline descriptors get Renderpasses and subpasses
      -

      Then BeginRenderPass gets the renderpass and textures
      -

      There’s needed compatibility between renderpasses and pipelines
      -

   How to make progress on these
   -

      Would like to get some use cases and understand how they’d be
      implemented in Vulkan (and other APIs)
      -

      MM: use cases are good. They won’t be comprehensive.
      -

      JG: think we made a bunch of progress here

Multiple queues

   -

   CW: ties in to this topic and will be just as contentious
   -

   In the roadmap, we have consensus on queues such that:
   -

      There should be one queue type that can do everything on all APIs
      -

      Some implementations may support multiple queue types
      -

      It’s not clear whether we can have more than one queue per type
      -

      Not sure whether we should force all impls to have multiple queue
      types
      -

   MM: Metal doesn’t have synchronization between multiple queues
   -

      We agree that we need synchronization between multiple queues
      -

   JG: in Metal you can get callbacks when queues are done
   -

   MM/all: but that’s round-tripping to the CPU
   -

   MM: regardless of once per frame or a few times per frame, you have to
   round-trip
   -

      If you’re going to have multiple queues, you’ll probably require
      synchronization without round-tripping
      -

      Metal doesn’t need this because they only have one queue
      -

      If the implicit dependency graph is that the things can run in
      parallel, and the GPU has facilities, they can run in parallel
      -

   CW/JG: discussion about multiple queues and fences
   -

   CW: you’re not intended to use multiple queues in Metal, because the
   synchronization is through the CPU. In Metal, if the driver discovers you
   can take advantage of parallel hardware queues, it’ll parallelize it.
   -

      Async compute happens automatically-ish.
      -

   JG: understood.
   -

   MM: in metal there’s no reason to use multiple queues. The fact that you
   can make multiple queues is just a natural thing. But they’re not designed
   to be used.
   -

   CW: there’s device submit. Queue submit is queue.device.submit.
   -

   RC: how do you specify the dependency graph?
   -

   MM: it’s implicit. As described during the last meeting.
   -

      Ex: blur something just drawn. One RenderEncoder which draws the
      thing. Second ComputeEncoder which lists that the texture you
drew into is
      a readable input. Dependency graph is implicit.
      -

   JG: do you think we should have multiple queue instances in this API?
   -

   CB: basically asking whether the app should say what can run in
   parallel, or the API should determine what can run in parallel via
   specification of dependencies
   -

   CW: if explicit, then API has to include queue synchronization
   facilities (on the GPU – no round-trip to the CPU).
   -

   DM: Metal backend could say that it only has one queue available so that
   it doesn’t have to implement synchronization.
   -

   KR: can we support async compute in Vulkan without making everything
   explicit? Like Metal?
   -

   CW: you have to declare which queue type things can run on
   -

   JG: two types of objects in Vulkan, shared and exclusive. Shared can be
   used across multiple queues. Exclusive have to be transitions. Can
   transition sub-parts of objects to run on different queues.
   -

   DM: “concurrent” and “exclusive”.
   -

   DJ: async in Vulkan: different queue type / instance.
   -

   CW: one instance “graphics/compute/blit/present”. another
   “compute/blit”. Do main rendering on first one. Async compute goes on
   second one.
   -

   CB: motivation in DX12 was: get the drivers out of the business out of
   analyzing command streams and determining what was parallelizable. But at
   this level of abstraction that doesn’t seem like that much of an issue.
   -

   JG: would be nice to retain the benefits.
   -

   CB: if we put this all in a single ref implementation we can all
   optimize it ourselves. Can provide optimization of Metal behavior in a way
   we’re all comfortable with.
   -

   JG: can a ref impl be good enough that we’re satisfied with doing it
   automatically?
   -

   CB: not sure there’s much value to be added with letting the app do it.
   It’s just that we’ve seen arbitrary cost in some drivers. But if it’s our
   ref impl then we can do it.
   -

   MM: no one way to do it right?
   -

   CB: Metal team seems to have figured it out.
   -

   CW: intuition: if we make queues explicit, think apps are more likely to
   take advantage of them.
   -

   RC: so in Metal you have to tell the encoders what your inputs are, so
   it can figure out that the compute stuff can go in parallel?
   -

   CW: they’re provided when you say SetFragmentBuffer
   -

   MM: when you create the encoder you don’t say “I’m going to use these
   resources”. But at the time you list them you’re using them for what you
   want.
   -

      When you describe you’re going to use this texture for this purpose
      it does 2 things. Attaches texture to shader. And says that
synchronization
      is needed.
      -

   RC: and if you say i’m just going to run this, then can run in parallel?
   -

   MM: yes, if you have a compute thing with no buffers and textures
   attached, then the compute thing could run entirely in parallel.
   -

      Rendering algorithm with two textures as input, both filled via
      compute. Those compute things could run in parallel.
      -

   DJ: given you have to express the deps up front, why do you need a
   separate queue?
   -

   JG: section 6.2, sync guarantees. Submission on a single queue is
   implicit
   -

   MM: no. first thing finishes before the second thing *finishes*.
   -

   CW: also, can’t put compute in render passes in Vulkan.
   -

   MM: dean’s question is why this is required.
   -

   CW: dumping sub-parts of the graph which are graphics-only.
   -

   MM: that’s a bad design. why?
   -

   CW: tile cache might be using compute shared memory
   -

   JG: this might be a concession about using a single queue without
   working about command buffer sync
   -

   CW: maybe a concession to console developers and they want full control
   over the hardware. maybe they have a task graph but want explicit control.
   -

   KR: it might be worth trying to do this automatically
   -

   CW: sync between queues has a cost. If you have a tiny compute shader
   used to generate a DrawIndirect, and it has sync and what not, it’s not
   worth to put it async. We can’t know the cost upfront.
   -

   MM: there’s a cost to marking a compute shader ‘expensive’, and
   submitting it to a separate queue
   -

   CW: seems easier to have the app tell you to run the thing in parallel.
   Doesn’t necessarily mean that we expose the concept of queue, but the graph
   needs to be specified up front.
   -

   MM: that seems easy to agree to. “This computation could possibly be
   asynchronous”.
   -

   KN: not necessarily one compute op. Maybe multiple, and have to be
   ordered w.r.t each other, but async w.r.t everything else.
   -

   MM: the app submits to different queues, and you have your async
   compute. At end, want to join them and show frame. In Metal you can’t do
   that.
   -

   CW: in Metal, you’d have compute and render happening in parallel.
   RenderEncoder A, ComputeEncoder B. RenderEncoder C, renders to final render
   target, and implicit dep on both. Submit both in any order, and Metal
   figures out A and B can run in parallel, and have to join for C.
   -

   JG: if you have an active pipeline then you could make the pass-back to
   the CPU to establish this
   -

   MM: if you have things that are sharing the same buffers, then in Vulkan
   one goes to one queue and one to another. In Metal, could easily get into a
   place where you deadlock because the ordering is wrong.
   -

   CW: yes, RenderEncoder A using a buffer, RenderEncoder B using the same,
   and they won’t run in parallel because they might race.
   -

   MM: opposite. App puts one in one queue and one in another.
   -

   CW: app can’t do that without inserting transitions of resource from one
   queue to another. Using resource for writing in two different places.
   Invalid.
   -

   DM: exclusive ownership for resources?
   -

   CW: yes, that’s my view – just a proposal. A resource is either readable
   or writable as one specific type of thing on one queue.
   -

   MM: a bit blocked. Higher level?
   -

   CW: at any single point in time, a resource is either readable by the
   world, or writable by only one queue. (This is just a proposal for
   eliminating undefined behavior) In the backend, would put in
   synchronization (in vulkan – in metal would no-op)
   -

   MM: in the one Metal queue, you’d first have to submit the commands –
   the command flow has to follow the resource.
   -

   CW: on the app side – resource is used first for render, then compute.
   Submit render command bufs using resoure. Transition rsrc from queue render
   to queue compute. Now can submit command buffers that use rsrc for compute.
   In Vulkan, would use a fence. In Metal, each time you do submit, create the
   encoder, so things are well ordered.
   -

   KR: is there something sub-optimal for Metal here, where we have to
   defer things to queue submit time?
   -

   CW: when you encode a command buffer you’re putting it in the queue. You
   have to encode things in order.
   -

   MM: no, you don’t have to encode things in order, but commit them in
   order.
   -

   KN: thought you had to commit one encoder before you got the next one.
   -

   MM: drawing use of buffers on different “queues” and different
   dependencies which would cause deadlock in Metal but not Vulkan. (B/A, A/B)
   -

   CB: deciding whether we need explicit parallelism.
   -

   MM: suggesting this is impossible.
   -

   CW: the Metal driver’s doing dependency analysis. That would be really
   bad in the backend. The driver’s signed up to do that, but not the backend.
   -

   CW: in Vulkan, when you do queue submit, things have to be transitioned
   into the right state.
   -

   MM: so both vulkan and metal will have to validate this scenario?
   -

   CW: that’s validated if you have explicit transitions.

Shading language

Fil’s presentation

   -

   Discussion about various topics
   -

   DN: there’s a lot of content for GLSL. Let’s say you added generics and
   slicing. slicing looks like the killer app for this. If I were to explain
   this to someone in the GL world, it’s a nicer GLSL with slices and
   templating.
   -

   CB: in the HLSL we’re working on adding these to the cut-down version of
   C++. One difference is we’ve had unions on our plate for a while. Know
   we’re not going to be able to implement all this on the GPU.
   -

   DN: OpenCL C++ kernel language has templates etc. For C++ people but
   removes much of the dynamic stuff. Still keeps the OpenCL C pointer
   restrictions.
   -

   DJ: you wouldn’t need logical addressing mode restrictions for OpenCL.
   -

   DN: we’re all going after the same GPUs.
   -

   CB: we’re all trying to go after the C++ model, but going after the same
   hardware, as the hardware evolves.
   -

   DN: want to separate programming model concerns with technical concerns.
   -

   DN: still not sure what the security model is (bounds checks, etc., at
   what time do you detect that)
   -

   DN: you’re creating a new language and asking everyone to move over
   -

   CB: but they’re not breaking changes.
   -

   CW: WSL looks like a subset of HLSL
   -

   CB: not into the whole branding a language for the sake of it
   -

   FP: WSL is C++ without classes. We gave it a name just to have a name
   and a directory to put it. It’s the kind of language you can tell someone
   who knows C++ that “here are the rules”. You mentioned no clear story of
   how you handle errors. MM and I came up with a thing that WSL will do:
   program terminates early.
   -

   DN: we hadn’t agreed as a group what the criteria are.
   -

   DN: the generics you mentioned are template based, so you’d wind up with
   e.g. 5 copies of the code if you had 5 different instantiations.
   -

   FP: Yes but because of inlining you would have 5 different
   instantiations anyway.
   -

   DN: have talked with people who have significant codebases that say if
   you have that genericity at compile time, you wind up with unacceptable
   performance. They do dynamic polymorphism. When you access memory you
   change how the load is done. Have heard this from multiple directions.
   Might be the kind of thing you say, sorry, frontend has to handle this in
   some way, even if it causes code bloat, etc. Maybe a concern, maybe not.
   -

   FP: the problem is inlining, not generics. If you allow a shader
   language to have functions then you ultimately have to implement that in
   the language by inlining.
   -

   DN, KN: that’s not true.
   -

   KN: has to be inline-able. But many platforms will not actually inline
   it, because you’d end up with too many instructions.
   -

   CB: What we are looking at doing is maybe having a link step that does
   dead code elimination.
   -

   DN: the problem is that dynamically, at runtime, you might have 1 of 100
   different things
   -

   CB: it’s very dangerous to make 100 copies of cide
   -

   DN: High level point is: people who see “I need pointers because XXX”
   don’t want instantation explosion but just one code path. The model
   presented for WSL doesn’t help for that because it still has the code
   explosion problem.
   -

   FP: understand what you’re saying. Valid concern. Data point: people are
   using templates in Metal and they’re happy with them. The reason why it’s
   kind of OK is: killer app for templates are killer numeric code. This is
   what people use templates for. If you’re trying to write OO code using
   templates, it’s hell.
   -

   DN: some customer of yours said “I’m using pointers because blah”, and
   you create this solution, but you may take this back to that customer and
   they’ll say “it didn’t solve my problem”. Maybe you’ll go back and do more
   work in the compiler.
   -

   CW: question not related to language design: what’s the delivery
   mechanism of the language to the API?
   -

   MM: it can be whatever we come up with.
   -

   CW: is it a goal of this language to be faster to type and safety check
   than SPIR-V? or to be a High-level language to be accepted and lowered to
   SPIR-V?
   -

   DJ: we think it’s not going to be significantly slower than type checking
   -

   MM: if your Q is “what language does our API accept” then the model is
   that our API accepts WSL.
   -

   DN: so this is what’s being proposed to WebGPU.
   -

   KN: the only reason to add safety to WSL is because you can add security
   checks more intelligently than SPIR-V. If our thing injected clspv and
   opencl c with a restricted set of opencl c and we could type check it. Only
   reason to add a new language is to more intelligently add safety checks.
   -

   DM: have we seen a case where WSL would be safer than SPIR-V?
   -

   FP: have the safety checks that are minimally needed to add memory
   safety to SPIR-V been added so we can check them against WSL?
   -

   DN: haven’t spec’ed them fully.
   -

   CW: seems buffer checks mainly. There are also texture image fetches,
   and you have the texture size available at the call sites. Feels like the
   biggest safety feature is buffer checks.
   -

   DN: spir-v buffer fetches have been deployed to date on platforms where
   robust buffer access is present.
   -

   MM: need to handle platforms that don’t have it.
   -

   CB: it gives you better access to safety
   -

   KR: point about needing run-time checks at all accesses of slices in WSL
   -

   KN: question about doing the checks up front
   -

   FP: if there were no rule about checking slices up front then in the
   pointer case you’d be unsound. If I could create an array slice that’s
   pointing out of bounds then a subsequent checked access might go out of
   bounds. In logical mode I don’t think there’s a significant cost of bounds
   checks here.
   -

   DN: you’re checking the object you’re referencing into. But you’ll
   reference into the slice with a run-time determined value so you will have
   to check it anyway. Effectively you have a fat pointer that you’re passing
   around and you have to check the index.
   -

   FP: are you talking about logical mode or not?
   -

   DN: yes
   -

   FP: the reason why slice creation has a bounds check at that point is
   that if you have totally unconstrained pointers, it’s like i’m giving you
   an inductive hypothesis that that slice is valid. But need to check that
   the slice is valid up front too.
   -

   DN: so guaranteeing that base object is valid. Now have an arg index
   which is some number. Need a bounds check. It’s the same bounds check that
   you’d need with SPIR-V logical mode.
   -

   CW: what’s the value of the API ingesting this language, vs. ingesting
   SPIR-V which can be a compilation target of it?
   -

   FP: 1. textual format and not a binary format. we think based on
   feedback from webassembly that future programming formats for the web were
   textual so view source works.
   -

   FP: 2. this language already has specified type rules for areas that
   have security implications. it’s designed for security from the start
   -

   FP: 3. as we discover how the webgpu spec is supposed to work
   (constraints it runs into, etc.), having a language that this committee
   owns that doesn’t require approval by another committee will give us
   flexibility that we need.
   -

   CW: rebuttal:
   -

      1. view-source problem: we don’t need a standardized textual format
      for this. can view webassembly on the web right now. can view spir-v
      disassembled right now. no-one writes this. (CB: view source isn’t useful
      in that case.)
      -

         DJ: the feedback we got from teams that write shaders is that they
         want a human writeable format.
         -

         DJ: so you want to ship a spir-v compiler along with your source?
         -

         JG: then ship a compiler
         -

         CB: compromise: spir-v could well be the underlying implementation
         of this. but if that’s all you spec, big q about ease-of-use of
         programming. but you could get a rich diversity of compiler
languages, so
         no sharing of code around the internet, defeating the purpose of a w3c
         standard. we could do this and make a new platform for people
to make new
         languages in. but best to have a lingua franca with a syntax that is
         supported on every browser.
         -

         FP: i wasn’t describing the lack of view source. i’m describing
         criticisms from developers about lack of view-source.
         -

         KN: understand. they want the originally written source code.
         -

         CB: they need to round-trip it.
         -

      2. spir-v logical addressing mode is a feature of spir-v by default
      and it’s secure. theoretically you’ll be generating valid spir-v. can run
      spir-v validator on it. it’s great to have a HLL to embed the security
      properties, but it doesn’t make it better for ingesting by the API.
      -

         FP: is there a reference implementation enforcing the security
         properties you suggest?
         -

         DN: the spec says exactly what logical addressing mode operands
         can be.
         -

         FP: doesn’t say what the bounds check behavior is.
         -

         DN: are you asking for an implementation which checks validity of
         program that’s running? or statically, plus a certain number
of runtime
         checks?
         -

         Discussion with DN and FP about gluing SPIR-V spec to GL or Vulkan
         spec
         -

         JG: super impressive that you’re creating a new language. disagree
         on the need for it. we have 95% of a solution in front of us
in the form of
         spir-v logical addressing mode. we already did this for
opengl and glsl for
         webgl.
         -

         DJ: how many languages are compiled to SPIR-V logical addressing
         mode?
         -

            HLSL. GLSL. OpenCL C subset.
            -

         CW: the reason to take SPIR-V is: high-level shading languages
         have corner cases.
         -

         DJ: but the security researchers came in yesterday and showed us
         bugs in SPIR-V drivers / compilers.
         -

         CB: he’s showing us how to add pointers and templates for limited
         use cases. would like my ide to guide me along a path where
we implement
         these new features robustly.
         -

         CW: perhaps we haven’t stressed enough that this is a great
         experiment and something we have. we’d like to use this to
write shader
         code. but it’s a question of what the api ingests.
         -

         FP: we are arguing that the api should ingest a textual format,
         and that it should be designed from the ground up to meet the
needs of the
         web, and that this format is something that’s owned by this
committee. no
         matter what we pick there’ll be some friction between the
language and what
         this committee is trying to do. the language has never been used by
         anybody, no backward compatibility constraints. new language
is an asset.
         also, saying 95% of spir-v is described is not true.
         -

         JG: there’s similar prior art with making GLSL secure.
         -

         FP: this could be viewed as an extension to GLSL.
         -

         CB: in this modern world with tons of github repos, it’s better to
         have human-readable.
         -

         DJ: another useful feature: making a textual language easily
         translatable to the lower level languages. all the platforms
are in the
         same spot, requiring translation. and models the webgpu api
because we can
         control e.g. the bindings. we’ve seen this in metal.
         -

         CW: yes it’s great to have debugging, and can have that by not
         strippign all debug info from spir-v.
         -

         KR: Big pieces missing: analysis of current shaders and kernels
         that will be ingested from the system, and, limitations on
the underlying
         shading languages (no 8-bit load and stores). There are low-level
         limitations that bubble up to the high-level language.
         -

         CB: ???
         -

         KR: Can’t come up with something from thin air that isn’t grounded
         in the limitations of all the targets we have. We know we will need to
         inject bounds check just like in WSL. Concern we are going to
throw out all
         prior art, and all prior kernels (need HLSL to WSL).
         -

         FP: Not true currently the spec is a JAvascript interpreter. Then
         compiler to SPIRV, then SPIRV to WSL. We will prove
isomorphism. We think
         it is very grounded in prior art.
         -

         CB: Since it is isomorphic then it doesn’t prevent people from
         reusing their kernels.
         -

         KR:
         -

         DJ: If we have isomorphism between WSL and SPIRV then HLSL and
         GLSL work.
         -

         JG: Awful lot of work when things already work.
         -

         CB: Lot of people interested in pointer and templates. Think
         improvements to the shading language is part of the features of WebGPU.
         -

         KR:
         -

         DJ: And without robust buffer access.
         -

         FP: Need to go, think there is a lot of info, need to take some
         time. Provide slides and code.
         -

         FP: what it’s going to need to secure spir-v will be useful no
         matter what we end up deciding.
         -

         DN: are you going to show your slides to your customers who
         requested templating and see whether it’s what they want?
         -

         JG: one more request: could you choose a different name? WSL is
         already a well-known name on Windows (Windows Subsystem for Linux).
         -

   BJ: clarification: you want textual format for the web. but webgl
   developers have said they want binary shaders. is your plan to ship spir-v
   and back-translate to WSL?
   -

   DJ: first one. also: why do they want a binary format? refuted by
   disassembly arguments earlier.
   -

   DJ: also we have webcrypto which will really hide their content if they
   want.
   -

   MM: it’s designed for the parsing and type system to be easily decidable.
   -

   KN: the web developers who say they want view-source are not the same as
   the people who want to load large amounts of shaders quickly from the web,
   compile quickly, avoid as much compilation time as possible. think loading
   performance is better than view-source.
   -

   DJ: we agree. and we think WSL will compress well. we think the
   compilation, parsing, loading time will be fast. the speed here is
   important to us. if we took spir-v we’d still have to parse, translate,
   convert to metal. spir-v won’t give us an advantage here.
   -

   CW: spir-v is designed to be the receiver of many languages so that you
   can efficiently compile spir-v to native targets. if it’s isomorphic to
   spir-v then why not ship a wasm module that compiles wsl to spir-v.
   -

   DJ: would be cool to see someone writing a shading language that has
   these templates, etc. and compile to SPIR-V.
   -

   BJ: during vulkan development, was desire for bytecode languages. devs
   pushed back because people said people would expect it to load fast but it
   wouldn’t. the existence of spir-v is probably that there was a practical
   benefit to it. is spir-v more quickly consumable?
   -

   DN: speed of loading was a non-goal. spir-v is intentionally high level
   to avoid premature optimization.
   -

   CB: from a practical perspective: every language is a derivative of
   clang and every IR is a derivative of LLVM IR. LLVM IR is probably one
   point in the process of this compilation step and the language is likely to
   be a cut-down version of clang, and the question is how far we cut it down.
   would like developers to have the options.
   -

   DN: spir-v is deliberately distinct from LLVM. It was a mistake Khronos
   made – twice – and a mistake multiple teams within Google have made – to
   tie themselves to LLVM IR. When we designed the original SPIR, we
   deliberately avoided basing it on LLVM IR.
   -

   CB: SPIR-V, DXIL, LLVM IR, etc are all pretty similar.
   -

   CB: need to decide whether we have a high-level, low-level, etc.
   approach.


   -

   WSL / WebGPU Shading Language
   https://cdn.rawgit.com/webkit/webkit/master/Tools/WebGPUShadingLanguageRI/index.html


   -

   MM: demo MM: This is a compiler that creates an AST and evaluates it by
   visiting it in Javascript.
   -

   MM: If you are interested in WSL there is a live version of it.
   -

   CW: should clarify the discussion about shading languages into:
   ergonomics, security, etc. and have different AIs for different people.
   -

   DN: wanted to ask FP to show preso to key customers. Think there may be
   some resistance that the thing the customer wants (pointers) wasn’t
   resolved by this proposal.
   -

      Depends on the customer.
      -

   DJ: ok, so we need to find out whether customer’s requests for pointers
   and generics were satisfied by the WSL constraints. Might be talking with a
   set of customers who are different from your (Google’s) set of customers.
   -

   DN: I heard a request from Filip about how much work there would be to
   secure SPIR-V. We should take an AI to enumerate exactly what we mean from
   secure. Namely, that access to buffers and images are checked. Robust
   buffer access in a software implementation.
   -

   MM: if you want to use the SPIR-V spec you need to look at 2 specs.
   -

   CW: you have to look at the environment spec.
   -

   DJ: if we accept SPIR-V we need an environmental spec for WebGPU.
   -

   CW: that can be our (+ Mozilla’s) AI.
   -

   DJ: also, going to do prototype investigation in validating that SPIR-V
   is secure
   -

   DN: namely, that you can make sure that ingested SPIR-V validates, and
   that runtime checks are injected.
   -

   CW: need a validator, and a SPIR-V pass which adds bounds checks to
   buffers, image fetches. These are super easy.
   -

   DJ: we will take the AI of cross-compiling WSL to and from SPIR-V and
   MSL and will write down any snags we run into.
   -

   DJ: we were going to go around the room and did a straw poll.
   -

   CW: we should talk with Khronos about adopters’ fees for SPIR-V.
   -

      What are the implications of using SPIR-V in WebGPU?
      -

   DJ: seems fairly clear what Apple’s position is. We wouldn’t be working
   on it otherwise. To reiterate goals: based on what we thought were the
   requirements from the group and our own goals, it’s an investigation into a
   solution. We think it’s valuable and the right thing to do. If the group is
   really strongly pushing for SPIR-V, we want to know answers to questions.
   -

   DN: personal opinion: WSL is a great investigation to move the
   conversation forward. We haven’t pinned down enough of our own requirements
   to recommend in a convincing way what is required by the API. Also, if
   we’re serious about an MVP, SPIR-V gets us a long way along quickly.
   -

   CW: strong intuition that SPIR-V is the right answer.
   -

   KN: aside from everything from security, don’t see a benefit of WSL over
   SPIR-V but need more investigation.
   -

   DM: we should build and focus on the platform. From that point SPIR-V
   makes more sense. Like the language, but would slow us down to build a new
   high-level language.
   -

   JG: share Corentin’s intuition that SPIR-V is an efficient valuable way
   forward esp. given the experience we gained from WebGL 1.0 and 2.0. Not
   impressed with the motivation for starting something completely new, when
   we have something that’s close to matching what we need. Surprised that
   this was as contentious as it’s been.
   -

   RC: Chas already summarized. Agree with Apple that a textual language is
   important for the web. It’s been a tenet of the web that all you need is a
   text editor and web browser. Think we should use HLSL as the language. Chas
   has been open to standardizing it with W3C. Have recently taken contribs
   from SPIR-V group to have HLSL frontend, and getting SPIR-V folks access to
   HLSL repo. In other words, HLSL isn’t just controlled by Microsoft. HLSL is
   used by every Xbox game ever written, etc., and it’s been battle tested on
   a large body of content. But we also want to see innovation in the language
   space and think SPIR-V could be something WebGPU could reasonably accept.
   So at the low level think it would be better to ingest SPIR-V instead of
   DXIL.
   -

   KR: It was a very nice investigation and great motivation to make better
   high-level languages. Think it is too early to throw out all previous
   solutions, felt the presentation was dismissing prior art and core issues
   that are in the low-level languages. We will have to have WSL running on
   all platforms before we can choose it.
   -

      CB: Concern about breaking existing code with SPIRV? WSL is closer to
      GLSL and HLSL than SPIRV.
      -

      KR: No concern that WSL would break things, SPIRV already ecosystem
      to compiler from and to HLSL GLSL (and to MSL). NXT shows that SPIRV
      translates well to HLSL GLSL and MSL. Why not have WSL to SPIRV
translator
      early and have things running instead of writing many WSL
backends. On our
      side we should write the “security layer” for SPIRV. Could put
that in NXT
      then run tests on all paltforms.
      -

      CB: so put SPIRV validator in all browsers>
      -

      KR: Yes, WSL would be the same where you would have the compiler +
      validator + translator in all browsers.
      -

   KR: Think it is premature to choose a new language that hasn’t run on
   any GPU yet. +WSL is high level, our experience in WebGL is that native
   GLSL compilers were all broken. Should go with something more battle tested
   which is the SPIRV toolchain. Should choose SPIRV + look at security
   constraints. Yes three.js will have to have a compiler to assemble glsl
   shaders then translate to SPIRV, not sure how it will work, but we should
   standardize a intermediate level language and maybe a high-level language
   too in the browser (?)
   -

   ZM: Past couple years a third of our effort was working around compiler
   bugs. Think intermediate format would help with this.
   -

   DJ: is the benefit because we think there will be fewer compiler bugs?
   -

   ZM: if we have a high level language we should have a standard
   implementation that all browsers adopt.
   -

   BJ:
   -

      From the PoV of WebGL developers, having a textual representation
      easily consumable by the browser is super important. Has enabled WebGL to
      have its reach as people have been able to open dev tools and see shader
      code being run. So having a blessed high-level language is good as most
      shader code online would be in this language (the one for public
      consumption like three.js, shadertoy etc.). However don’t care about the
      exact high-level language. GLSL is preexisting and has benefits.
Mechanism
      to ingest it in the browser doesn’t matter as long as it is
consistent. If
      people have to bundle a bunch of WASM in Web page, it isn’t as good.
      Interpreter in the browser?
      -

      Browser dev hat: no comment on the language itself. Have doubts about
      amount of work people can put in making a language. We have access to
      Khronos through contacts and W3C doesn’t have expertise in graphics. Just
      defining the API is a big task. Saying we want to invent even more things
      make the task even bigger. Concern about adding even more delay
to shipping
      WebGPU. Is that acceptable?
      -

   SW: lack of high-level languages in which to write shaders is not a
   problem. Hesitant to endorse something that will segment the web further
   from desktop and mobile graphics development. Also, parsers are nasty
   complicated things where lots of bugs turn up. The HLSL folks have dealt
   with it on their side, so have the GLSL folks, don’t want to create a whole
   bunch more parser bugs.
   -
   -

   DJ: mentioned low level restrictions for some operations. That wouldn’t
   be encountered by SPIRV program?
   -

      DN: Vulkan only permits 32bit or bigger load and stores. 16-bit
      loads/stores are an extension. No 8-bit loads/stores.
      -

      MM: WSL must compile to that. So it will.
      -

      MM: Is there anything that you can’t do that isn’t listed in the
      SPIRV spec.
      -

      DN: You want to look at SPIRV spec plus appendix A of Vulkan spec.
      -

      CW: Appendix A of the Vulkan spec.
      -

   DJ: not important to this group or tech: the current environment
   (Vulkan/Spir-V) requires logical addressing mode. There’s a variable
   pointer extension. That use case is more from the OpenCL community, right?
   Will Vulkan change that environment to remove the restriction for logical
   addressing mode?
   -

   DN: that’s a forward looking statement.
   -

   CW: doesn’t really matter, since we have to run on shipping hardware.
   -

   MM: but in 20 years?
   -

   DN: Vulkan was made in an environment with no new hardware features
   except that which run current OpenGL. And SPIR-V was the way of specifying
   shaders in this environment, so it’ll evolve.
   -

   CB: Want to point out I agree with Ken, and mess with him :P SPIR-V is
   the de facto low level spec, HLSL the de facto high level spec. Want some
   amount of standardization advancing of both.
   -

   CB: DXIL is an open-source github project. If there are things to change
   in the language which could be made to make it more web friendly we are
   happy to talk.
   -

   DN: concerns about HLSL: lack of spec, and based on behavior of previous
   reference implementation. Know CB is going to address this. Hope the
   situation is improved. We need that as well in the web context.
   -

   CB: you have the source. In some extent that’s less ambiguous than
   anything written in the English language.
   -

   DN: also get unintended behavior. Some things done in HLSL shaders in
   the wild until you compile to a low-level representation and do a bunch of
   optimizations. Kind of a moving target. My team’s hitting that as well as
   others. We’re all agreed that this needs to be improved. Reference
   implementations have a lot of good properties but they also have bugs.
   -

   DJ: five companies. One with no preference. Google/Mozilla are saying
   “accept SPIR-V after security analysis”, with some slight web developer hat
   saying “source code is preferable”. MSFT/Apple say we want a human-readable
   text format; difference is that Apple is coming with a different proposal
   than MSFT.
   -

   DM: think there’s still space for high level language innovation like
   Rust did, like enforcing aliasing rules at compile time. Would be happy to
   do this as extensions later.

DOM Interactions

   -

   CW: how do we:
   -

      Put stuff on the canvas
      -

      Interactions with WebVR
      -

      Let’s not do workers. Dependent on what WebAssembly does (i.e., let’s
      not do multithreading)
      -

      How to upload DOM elements
      -

   MM: why is this different from WebGL?
   -

   CW: one complaint: WebGL can only render to 1 canvas. If you wanted to
   render to two, have to go through contortions
   -

   DJ: TL;DR: there are ways to do this in the web platform already. but
   since we can present the render buffer in multiple places we can build a
   better solution.
   -

   DJ: canvas.getContext(“”) works with one canvas. So we could make WebGPU
   work with >1, or 0, canvases.
   -

   MM: think that’s a hard requirement to get one of these things without a
   canvas.
   -

   KR: Some interactions with the Javascript interaction model, not
   different from WebGL so we can defer that. People complain that WebGL is
   its own thing outside of the rest of the DOM. Want to upload arbitrary DOM
   elements.
   -

   JG: let’s focus on uploading same-origin DOM media elements.
   -

   MM: so, for now, no arbitrary DOM elements, and let’s see what WebGL
   does.
   -

   CW: let’s start with Canvas and go right into WebVR.
   -

      DJ: one way to do this: make an instance of a WebGPU device. With
      getContext you pass in that device. Then you’re not really talking to the
      CanvasRenderingContext but something else.
      -

      JG: reminds me of ImageBitmapRenderingContext.
      SwapChainRenderingContext? It’s a destination, but not the only
way to get
      a WebGPU context.
      -

      Discussion about this
      -

      CW: if you allow putting any texture into a canvas. Unclear what the
      browser does to put textures on the screen. Need to declare how you’re
      gonna use the texture. Could get complicated if the canvas is in its own
      layer or not, etc. Maybe ask canvas “give me a texture to render into”?
      -

      KN: with WebGL we render *into* the IOSurface.
      -

      DJ: keep the great ergonomics WebGL gave you so you don’t have a lot
      of setup. Don’t need to allocate the depth buffer, etc.
      -

      BJ: agree, one main advantage of WebGL is “getContext” and start
      drawing. Not requesting tons of pixel formats, etc. Same for media
      elements; don’t need to allocate your own JPEG decoder, etc.
      -

      JG: those have value. Like the way where you look at the physical
      devices / adapters, and see which one you want to use. Forces
the developer
      to make some choice, but the worst thing about creating a new context is
      “ChoosePixelFormat”.
      -

      CW: in all 3 APIs you don’t choose a pixel format. Maybe the canvas
      tells you “here’s the format; deal with it”.
      -

      DJ: create a WebGPU Device. Then Canvas.getContext(). Gives you back
      a CanvasRenderingContext. That’s the thing that gives you the SwapChain,
      attach it to the device, and go.
      -

      JG: instead of a WebGPU context; have a SwapChainRenderingContext.
      -

      KN/JG: more discussion about this
      -

      MM: device is not actually a device in what Dean said. (Doesn’t refer
      to a particular adapter in the system.) Somehow you’ll need to
get the root
      object for the API.
      -

         Should be able to get that root object with no parameters.
         -

         KN: new WebGPUDevice().
         -

         JG: sure.
         -

      MM: agree that there’s some constructor that takes no arguments.
      Other constraints too, but not for today.
      -

      DM: would need to pass in the queue created even earlier.
      -

      BJ: thinking through some feedback: question about SwapChains. Why
      can’t you have a SwapChain be creating ImageBitmaps?
      -

      KN: don’t want to incur copy from ImageBitmap to screen. Want to
      render into top-level thing given to DirectComposition,
CoreAnimation, etc.


   -

   WebVR


   -

   DJ: does this mesh with how WebVR works?
   -

   BJ: WebVR does not explicitly require WebGLRenderingContext. In upcoming
   API, you create different layer types. There’s a WebGLLayer. Create it by
   passing WebGLRenderingContext. You attach this to the session and say
   “start presenting”. Would pass in WebGPU context (or, correction,
   SwapChain).
   -

      Intent: with WebGLLayer, you ask it for a framebuffer to render into
      every frame, so it’s effectively a SwapChain. Lets the underlying native
      API provide the surface you render into.
      -

      Either that layer should act as a SwapChain, or point to a SwapChain
      and provide the “next” surface to render into.
      -

      Need to make sure that SwapChain would potentially be populated by
      surfaces coming from the native VR APIs.
      -

   MM: want VR to be a supported use case.
   -

   (All agree.)
   -

   BJ: WebVR’s designed in a way so that you’re expected to have completed
   your rendering by the end of the callback that gave you the pose. Given
   nature of WebGPU API where there’s a lot of asynchrony, unlike WebGL, it’ll
   make things more difficult for developers.
   -

      But if they can maintain a double-buffer of resources and prep
      everything before your next callback, you can get everything done.
      -

   DJ: some of this will be educating developers.
   -

   BJ: there are patterns from WebGL that wouldn’t work.
   -

   DJ: we could have an explicit “PresentSwapChain” API
   -

   DJ: could ensure in our API that nothing’s going to block and take a
   long time. Developer has to be aware things will be asynchronous. Will have
   to set things up in advance.
   -

   BJ: think we won’t have an explicit “Submit” or “Present” API. Asked web
   platform leads, was shot down.
   -

   BJ: we also have an explicit “requestFrame”. Can do all the prep, wait
   for fences/barriers, then call requestFrame.
   -

   BJ: requestFrame syncs with the headset’s sync loop. 90 Hz instead of 60
   Hz.
   -

   BJ: feel pretty comfortable it will work, will require a mindset change.
   -

   KN: how are we going to upload the pose? Need a synchronous upload of
   the pose data.
   -

   BJ: array of view matrices + array of projection matrices. Usually 1 or
   2 of each. Maybe more for lightfield displays.
   -

   BJ: if I can take 64 floats and make them available inline before the
   draw call that would be sufficient.
   -

   CW: there will be a way to do uploads. But for sure there’s a way to
   update a uniform buffer with data. Don’t worry. We don’t know the exact
   mechanism yet, but it will exist.
   -

   BJ: good. It’s a hard requirement that we can communicate the pose
   synchronously with respect to the current frame.
   -

   MM: so we need it to be communicated to the draw call be done.
   -

   MM: doesn’t need to flush. No round-trip.
   -

   CW: without blocking there’s a way to provide data to the GPU.
   -

   KN: staging buffer or similar.
   -

   CW: WebVR ideally gives you a texture array and you render to one layer
   and then the other.
   -

   BJ: yes, ideally. If support’s there consistently then if you use WebGPU
   you *always* render to a texture array.
   -

   CW: all APIs do support texture arrays, so we can require that be the
   mechanism.
   -

   BJ: won’t affect many people. Will make rendering more efficiently.
   Don’t have to have connections to the current limitations of WebGL
   interacting with WebVR.
   -

   KN: is it possible that swapchain of native system will be designed for
   side-by-side rendering?
   -

   BJ: that’s the best way to interface with Daydream right now. But by the
   time WebGPU comes out it’ll probably have been moved forward. Also we can
   probably do a blit at the very end of the pipeline. And if that puts it at
   a disadvantage then we should fix Daydream.


   -

   Upload from dom media elements


   -

   KR: Let’s learn from our mistakes. In WebGL turns out there are some
   sync operations that happen in some cases. For HTMLImageElement synchronous
   decode needs to happen. For HTMLVideoElement HW and SW path conflated but
   it prevents some 0copy. HTMLCanvasElement needs GPU to GPU copies.
   ImageBitmap from HTMLImageElement give you the data ready for consumption
   by the GPU
   -

   KR: Suggest we force uploading from HTMLImageElement and require
   ImageBitmap instead.
   -

   DJ: Could use the decode() function on HTMLImageElement.
   -

   KR: does that take extra arguments like flipY unmultiplyAlpha etc. Don’t
   think image element has it.
   -

   KR: For WebGPU there is not state for “pixel state”. Not sure about
   flipping Y. Will need to deal with this stuff in WebGPU and figure out how
   things will interact. Suggest ImageBitmap is the only way to upload images.
   For video elements suggest we do something like the “live update” mechanism
   LG is working on for WebGL. =
   -

   DJ: What if I want to keep the frame while the video is playing?
   -

   KN: LG’s thing is 0copy. You can make a copy if you want a fixed image?
   -

   KR: Need to support HW and SW paths. HW like a texture source. SW give
   data, copy in a buffer then upload to texture? Basically we need to try to
   get the video decode path with as little copies as possible.
   -

   DM: How important is video?
   -

   All: very important.
   -

   KR: HTML canvas, maybe do an image bitmap from it?
   -

   DJ: Like only 2 entry points: image bitmaps and video. How do Image
   bitmap work with compressed textures?
   -

   RC: You can’t make one from an image or a video, you need to do an
   upload.
   -

   DJ: So we need a way to upload from ArrayBuffer?
   -

   DJ: thinking more of the WebGL case, where the only way you can upload a
   DOM element is via TexImage2D. That’s why you need the ArrayBuffer entry
   point. But in WebGPU you’re going to upload to buffers that aren’t
   necessarily images.
   -

   RC: asking can we upload an image to a vertex buffer?
   -

   CW: upload to buffer or to compressed format?
   -

   DJ/MM: want to upload raw compressed bytes (ETC, DXT, etc.)
   -

   CW: you’d need a query mechanism to know the supported compressed
   texture formats.
   -

   Some confusion about how WebGL handles compressed textures.
   -

   MM: so, two entry points for uploading to textures from DOM.
   -

      One accepts ImageBitmap.
      -

      The other accepts HTMLVideoElement.
      -

      KR: we need to separately consider the software and hardware cases
      for HTMLVideoElement.
      -

   CW/JG: if you have raw data, you MapBuffer/copy data into
   buffer/UnmapBuffer.
   -

   DJ: it’s a bit more code. creating ImageBitmap returns a Promise.
   -

   MM/DJ: the “one line” in current WebGL samples leaves synchronous
   blocking.
   -

   DJ: you don’t want a wait in your WebVR rendering callback.
Received on Tuesday, 26 September 2017 13:55:29 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:52:22 UTC