Notes from the 2017-09-13 meeting from Corentin Wallez on 2017-09-15 (public-gpu@w3.org from September 2017)

From: Corentin Wallez <cwallez@google.com>
Date: Fri, 15 Sep 2017 15:25:13 -0400
To: public-gpu <public-gpu@w3.org>
Message-ID: <CAGdfWNMk43LTFF4J1PjfM-ZvvBPL6_p7TFCVrqSM8pnkXWybwg@mail.gmail.com>
GPU Web 2017-09-13

Chair: Corentin

Scribe: Ken

Location: Google Hangout
Minutes from last meeting
<https://docs.google.com/document/d/1T_bbzKC22BAq3ax_cnm_kcL-KJSG4nM5sH4C9egIUfk/>
TL;DR

   -

   Open questions that do not affect the structure of the API can be
   deferred post-MVP
   -

   Pipeline Objects
   -

      Rasterizer sample count and depth bounds deferred post-MVP
      -

      Pipeline caching should be deferred post-MVP. This will allow us to
      understand better what are the usage patterns and make correct
decision to
      design the API.
      -

      Primitive restart: Ben confirmed that 0xFFFFFFFF should not enable
      the sentinel value on int16 indices on D3D12. Consensus to put
index format
      in pipeline state unless better solution.
      -

      Explicit boolean for depth testing: concern that implicit might
      interact badly with future extension. Discussion deferred.
      -

   Consensus on consensus
   -

      Consensus that compute should be an MVP feature.
      -

      Supporting multiple queues in MVP is an open question (simpler but
      concern it has structural implications)
      -

      State inheritance between renderpasses / subpasses was marked as
      having consensus but things aren’t 100% clear.
      -

      Lot of consensus confirmed.

Tentative agenda

   -

   Administrative stuff (if any)


   -

   Individual design and prototype status


   -

   Consensus on consensus
   -

   Open questions on pipeline objects
   -

   Agenda for next meeting

Attendance

   -

   Apple
   -

   Dean Jackson
   -

   Myles C. Maxfield
   -

   Theresa O'Connor
   -

   Warren Moore
   -

   Google
   -

      Corentin Wallez
      -

      John Kessenich
      -

      Kai Ninomiya
      -

      Ken Russell
      -

      Zhenyao Mo
      -

   Microsoft
   -

      Ben Constable
      -

      Chas Boyd
      -

      Frank Olivier
      -

      Rafael Cintron
      -

   Mozilla
   -

      Dzmitry Malyshau
      -

      Jeff Gilbert
      -

   ZSpace
   -

      Doug Twilleager
      -

   Joshua Groves

Administrative items

   -

   CW: At the end of the meeting let’s talk about what
   -

   DJ: do we have a doc for agenda items?
   -

      CW: yes, will link it. LINK
      -

   DJ has been pinging lawyers about license discussion; no response yet;
   last email was that they were talking with Google and Microsoft
   -

      CW hasn’t asked in a while
      -

      RC also no news

Individual design and prototype status

   -

   Apple
   -

      MM: nothing interesting. Continuing development on shading language
      prototype
      -

      BC: any highlights you’d like to share?
      -

      MM: think yes, but want to save it for a shading language discussion
      -

   Google
   -

      CW: Not much to report
      -

   Microsoft
   -

      BC: haven’t worked on much code. Working through discussions and
      design issues. Making good progress we think.
      -

   Mozilla:
   -

      DM: making progress on OpenGL backend for their graphics abstraction
      -

         Prototyping inside Servo: can render triangles with graphics
         pipelines with decent framerate given that we’re reading back
the buffer
         -

         Servo prototype: https://github.com/kvark/webgpu-servo

Pipeline objects

   -

   CW: had an email thread: there were some comments that added questions
   on the pull request
   -

   CW: had consensus on:
   -

   CW: Not exposing the depth bounds feature of D3D12 and Vulkan because it
   doesn’t exist on Metal (?)
   -

      DM: did we really have consensus on this? Would like this to be
      exposed under a feature gate post-MVP
      -

      MM: agree to include it post-MVP
      -

      CW: would make sense for it to be exposed as an extension. Can we
      agree to not put this in MVP?
      -

      DM: OK.
      -

   CW: can not have different rasterizer sample count from the render
   target’s sample count
   -

      DM: why not require specifying this during pipeline creation?
      -

      CW: hard to do this from render pass
      -

      DM: render pass only says your images have this number of samples.
      Doesn’t say how the rasterizer will work on them. Ultimately want
      rasterizer to support different frequency than number of
samples. If we add
      this capability now then we don’t need to change the API later
      -

      CW: what’s the point of having a rasterizer sample count different
      from the texture’s sample count?
      -

      DM: imagine you’re rendering into non-MSAA textures. Rendering with
      sample count=16. Your shader can see the sample mask and …
      -

      BC: recurrent theme here: should be a stated goal that the MVP API is
      allowed to change later – necessary scaffolding, but don’t want
to restrict
      building features later (2 have come up in the last 5 minutes).
Also don’t
      want to forbid changing the API. Have to build and test the thing before
      you can design the API surface. Want us to not fear changing the MVP, but
      rather get to MVP quickly. Unless, there’s a very definitive proof that
      it’ll change the structure. Some features are like that and require
      changing many API points.
      -

      DM: coming from a standpoint that 2 of the 3 APIs require specifying
      that, but agree with deferring it post MVP.
      -

      MM: didn’t understand what you’re (BC) saying. Should we expect to
      make breaking API changes post-MVP?
      -

      BC: mental model of MVP: MVP is not version 1.0, but version 0.8 or
      0.9. It’s something we build to figure out what we want in 1.0. It’s a
      beachhead, not winning the war. Want MVP to not be something we have to
      support forever. Expect that we might need to change the pipeline state
      object in some way. Think users should expect to have to change
their code
      if they code to the MVP.
      -

      DJ: agree with Ben.
      -

      CW: for features where it’s not a structural issue, but one member in
      a structure or one function later, it’s easy to add it later without
      problems.
      -

   CW: do people agree we should defer depth bounds and rasterizer sample
   masks to post-MVP?
   -

      DM, BC: yes.
      -

   CW: Vulkan and D3D12 have different ways to cache things
   -

      Have either one sort of pipeline derivation – “this pipeline looks
      like this other one”
      -

      Or, let the browser do everything
      -

      JG: you mean, browser caches things implicitly?
      -

      RC: preference would be the latter. Maybe MVP is the browser redoes
      everything from scratch. Later, cache smartly for you. If
infeasible, give
      developer the knobs they need to do it themselves.
      -

      DJ + CW: agree
      -

      BC: building up a feature for AAA games for this, it took a few
      iterations to get it right. Think we’ll need the MVP and a few workloads
      running to get this right. Think we should defer it. This is one of the
      reasons to get the MVP running faster.
      -

      JG: agree. This is a solvable thing but kicking it down the road is
      fine, and preferred. State that users shouldn’t expect that
shaders will be
      cached by the MVP.
      -

      MM: Question for Microsoft: MSDN has some information about the
      cached PSO. Does that work like derivative pipelines, or does it
require an
      exactly equal pipeline.
      -

      BC: that’s one of the many corners of the API that can’t answer
      directly. Can get an answer quickly.
      -

      CW: pretty sure it requires the same pipeline. Saw it somewhere in
      the documentation. “The rest of the data in the PSO still needs to be
      valid and match the cached PSO or an error is returned
      <https://msdn.microsoft.com/en-us/library/windows/desktop/dn914407(v=vs.85).aspx>
      .”
      -

      CW: sounds like there is agreement that caching will be prototyped
      and done after MVP.
      -

   CW: primitive restart
   -

      Metal always enables primitive restart. No way to disable it.
      Otherwise you need to parse the index buffer and validate against the
      primitive restart index. Seems we have to enable primitive restart on all
      APIs. Ben on the mailing list explained that drivers should be looking at
      this for the index buffer format.
      -

      Encode the index buffer format in the pipeline state? How do people
      feel about this?
      -

      DM: would like to defer decision until we get more information from
      Ben. DM provided a test case to Ben, using MSFT-issued driver WHQL
      certified, and behavior is that 32-bit cut index works for 16-bit buffer.
      If this works then we don’t need to provide the index buffer type.
      -

      MM: what about 32bit inde buffer with a sentinel value of 0xFFFF?
      -

      JG: it’s just a value. In WebGL 2.0 we force enabled primitive
      restart because it can’t be disabled in D3D11.
      -

      BC: will investigate Dzmitry’s test case. Spec is that it doesn’t
      work to do this. At the time you provide this you’re also
providing lots of
      details about your index buffer. Will follow up on why it’s not
behaving as
      specified.
      -

      KR: Myles mentioned something about the stencil buffer / stencil
      mask. In WebGL we are making a rule that the sample mask only the stencil
      bits used in the FBO are what’s used. This is a behavior change that’s
      important for portability. The primitive index may also be being
masked to
      the number of bits in the indices.
      -

      BC: Fairly accurate data, the test checks that 0xFFFFFFFF should not
      work for int16 index buffers. The driver Dzmitry tested on
should not have
      passed this test.
      -

      BC: Having the pipeline state having the type of the index buffer
      seemed like an easy solution. Why do people think it isn’t a
great solution?
      -

      DM: concerned there will be clients who defer the index buffer
      binding and the type of the index buffer until later. Not sure everyone
      will know it at pipeline creation time. But if MSFT requires it in D3D12,
      then let’s require it.
      -

      BC: the spec does say this. Will also investigate your test case.
      Also have to know the vertex buffer format up front, and that’s even more
      complex than the index buffer format.
      -

      JG: this is not something that we, as the WebGPU driver, can infer.
      -

      BC: no. In the pipeline state, here are your shaders and root
      descriptor, and here are your vertices.
      -

      JG: index buffer width.
      -

      CW: that’s known when you bind the index buffer.
      -

      JG: and you can bind multiple different ones with the same pipeline
      state object?
      -

      CW: yes.
      -

      CW: agreement to put index buffer format in pipeline state?
      -

      JG: yes, but it’s just a new restriction. But if you ever had a
      situation where you mostly use U16s and then upgrade to U32s later, you
      have to create a new pipeline state object later. Maybe not so bad.
      -

      CW: with caching it should be free.
      -

      BC: in this case, if you’d written your engine in D3D or Vulkan, when
      you upgrade from 16 bit to 32 bit you’d also change this value
and create a
      new PSO.
      -

      JG: unless you’re not using primitive restart. Or skip using that
      index.
      -

      BC: ...yes...could do that. But this is a small thing to cover all 3
      APIs. This seems like a small mental cost to pay to get fast
performance on
      all 3 APIs.
      -

      JG: just trying to make sure we understand even the little things
      we’re leaving behind
      -

   CW: wacky idea of putting vertex buffers inside descriptor sets/bind
   groups
   -

      Seems nobody likes that, so let’s not do that
      -

      Agreed
      -

   CW: should we explicitly enable independent blending and depth testing?
   -

      The two are different
      -

      Independent blending: could figure out that you’re not doing it.
      -

      Depth testing: if the comparison function is always true, it’s the
      same as not enabling depth testing.
      -

      DM: mentioned we could use nullable IDL property to handle this. But
      it turns out it’s not possible. Can’t have a nullable dictionary
in another
      dictionary. Also implies we should do what Apple proposed, and derive
      whether the feature is enabled based on the values provided by the user.
      Both for independent blends and depth testing.
      -

      BC: to clarify: want to infer that it’s turned off because they used
      ALWAYS with no writing?
      -

      JG: yes. Then if you’re always passing you can disable the depth test.
      -

      BC: concerned the semantics are different than what we’re telling the
      driver
      -

      JG: agree...but can’t think of a way it’ll be wrong
      -

      BC: concern from resident spec expert that as other values get added
      to the depth testing, then the difference between ALWAYS and
turning depth
      testing on/off will be more apparent, and then you’re stuck. Having a
      separate bool enable seems more future-proof. They’re different in the
      hardware.
      -

      JG: counter-proposal: if we extend the depth test options, we could
      add back in the separate disabling of the depth test.
      -

      BC: when implementing this in NXT, felt like the fact that this
      didn’t map to how the hardware was talking about it made it hard to build
      it and hard to test it properly. Felt I was fighting it. Not a scientific
      measure.
      -

      BC: however, someone doing the Metal backend would have to do the
      reverse. It’s sort of a 2 out of 3 vote based on the structure of the
      low-level APIs.
      -

      CW: seems like a more difficult topic than expected. Let’s flesh this
      out more on the mailing list.

Consensus on consensus

Link to Myles’ document. <https://github.com/gpuweb/gpuweb/wiki/Roadmap>

Mailing list thread.
<https://lists.w3.org/Archives/Public/public-gpu/2017Aug/0000.html>


   -

   CW: high-level object model. There should be at least one type of queue
   that can do everything. Everyone agrees?
   -

      DM: what if the hardware doesn’t support compute?
      -

      CW: then doesn’t support GPUWeb. Vulkan requires one queue to support
      everything. D3D12 supports both.
      -

   MM: skipped whether we’re going to support compute.
   -

      Think we need compute.
      -

      CW: agree.
      -

      DM: think we need async compute.
      -

      CW: we should at least have compute.
      -

      BC: agree we should at least have compute.
      -

      CW: this is no longer an open question.
      -

   CW: MVP will only allow one instance of Queue, ever, per Device
   (instantiation of WebGPU) to simplify the MVP.
   -

      JG: don’t love it
      -

      CW: would be asynchronous w.r.t. the CPU, but since there’s only one
      Queue there’s no asynchronous work.
      -

      JG: could create command buffers in parallel but not submit them in
      parallel.
      -

      CW: yes.
      -

      JG: don’t love it, it’s an important part of the final design.
      -

      BC: it’s one of the reasons people move to these low-level APIs.
      Understand JavaScript makes concurrency hard. There were situations where
      we moved to D3D12 specifically to have multiple queues. Think
these are the
      problems people will want to solve. This is one of the designs that will
      completely change everything. Understand this makes things difficult
      because it reopens the barrier discussion. But think it needs to
be in the
      MVP.
      -

      CW: agree with BC’s statement. If we want to keep this as an open
      question then we should talk about it as a second phase in the memory
      barrier discussion. Synchronization between queues is a variant of memory
      barriers.
      -

      MM: agree it’s an open question.
      -

   CW: Queues need to be created at device creation time.
   -

      JG: Vulkan requires this. Or need to create things in the background.
      -

      CW: agreed.
      -

   CW / JG: Render passes.
   -

      Metal’s render encoders and Vulkan’s renderpasses will be encoded by
      the same concept – “RenderPass”.
      -

      JG: seems correct. Similarity between Metal’s render encoders and
      Vulkan’s subpasses.
      -

      DM: Let’s not associate subpasses with Metal encoders. May encode
      multiple subpasses into a single encoder.
      -

      CW: esp. if we use framebuffer fetch for optimization.
      -

      MM: Reason why they were marked as the same object, is that this is
      where you attach textures for rendering.
      -

      JG: they’re not exactly the same but are similar.
      -

      CW: not really consensus but more shared knowledge that these API
      objects are the same.
      -

   CW: work should be done in a render pass between Begin/End, can not do
   compute at the same time. Q: should we say that you can’t do compute at the
   same time?
   -

      Because of Metal and Vulkan, can not mix graphics and compute work.
      -

      Technically, Metal 2 on iPhone X can do it.
      -

      MM: we want to target more than a single phone.
      -

      CW: so should we explicitly Begin/End compute passes? Believe we
      should. Shows the developer that things are separate.
      -

      JG: no strong disagreement. Sort of makes sense but hesitate to
      commit to it yet
      -

      DM: makes less sense than graphics passes because compute passes do
      not share as much as subpasses in graphics. That’s why the case for
      Begin/End on the compute encoder is weaker than that for the graphics
      encoder. But, fine with Begin/End on compute.
      -

      CW: tentative consensus
      -

      CW: Open question: begin / end blit?
      -

   CW: consecutive render passes don’t inherit state. Same between render
   and compute passes. Render buffer state between passes is separate.
   -

      Is there state inherited between subpasses?
      -

      DM: sure. You can bind descriptor sets between subpasses in Vulkan.
      -

      CW: in Metal you can not inherit state between render encoders.
      -

      DM: but Metal’s the only one which doesn’t inherit state. And Metal’s
      the only one that doesn’t have subpasses.
      -

      MM: wish people would go back and read the notes, because it’s clear
      we had consensus on this issue.
      -

      CW: AI: add a link to the notes and talk about this more.
      -

   CW: you can only change attachments at a renderpass boundary. Think this
   is obvious because of Vulkan’s and Metal’s constraints.
   -

      DM: you mean binding a framebuffer. Yes.
      -

   CW: open question about whether renderpasses have synchronization.
   Should wait until the memory barrier discussion.
   -

   CW: consensus about using Vulkan’s 3-layered hierarchy of sets of
   descriptors, and a small number of descriptor sets to bind things. So you
   bind a descriptor to a descriptor set, and support lots of descriptors, but
   only small number of descriptor sets.
   -

      Can keep open questions about D3D12’s descriptor heaps, more q’d
      about Vulkan’s descriptors.
      -

      RC: so do we have consensus?
      -

      CW: we don’t have consensus on descriptor set allocation optimizations
      -

      RC: do we have consensus that there’s a straightforward and
      performant mapping?
      -

      CW: will be performant and easy-to-map if it looks like Vulkan. The
      only questions are around allocation of descriptor sets and pooling of
      descriptors.
      -

   CW: lots of consensus about pipeline states. Document in the repo covers
   this.
   -

      DM: renderpass information is not included as consensus.
      -

      MM: will update the document. It was old.
      -

   CW: Got through most of it. Can leave the rest for Chicago.

Agenda for next meeting

   -

   Chicago F2F Agenda


   -

   Discuss shading languages afternoon Friday the 22.
   -

   But following meeting will also be about shading languages.
   -

   CW: for Chicago: please add topics to the agenda document.
Received on Friday, 15 September 2017 19:35:39 UTC