Minutes for the 2017-10-18 meeting from Corentin Wallez on 2017-10-23 (public-gpu@w3.org from October 2017)

From: Corentin Wallez <cwallez@google.com>
Date: Mon, 23 Oct 2017 14:03:03 -0400
To: public-gpu <public-gpu@w3.org>
Message-ID: <CAGdfWNMqaq7RT_AySgs1U3tC9i26ZRqxY7dUpDMtj5HUkA4_ng@mail.gmail.com>
GPU Web 2017-10-18

Chair: Corentin Wallez

Scribe: Kirill, Kai

Location: Google Hangout
Minutes from last meeting
<https://docs.google.com/document/d/1-ciiWbGletoOXOGrBBZpOhhWEWJcIqK0Bb-dtHJ8ffE>
TL;DR

   -

   TPAC slot to chat with the WASM CG is Nov 7th, 2PM-3:30PM
   -

   Status updates
   -

      Apple working on WSL <-> SPIRV translation, making WSL it a superset
      of HLSL
      -

      Google searching for UB in SPIR-V, updating their Chromium prototype
      for TPAC
      -

      Mozilla has all gfx-rs backends ingesting SPIR-V, working on some
      testing.
      -

   Use cases for synchronization
   -

      Discussion on Vulkan render-passes
      -

         Vulkan-sylte render passes require giving (render pass, subpass)
         at pipeline creation time. Concern that it isn’t the right API and
         suggestion to have single-subpass for MVP.
         -

         Counter-point is that Metal is more flexible because it only cares
         about one GPU vs. WebGPU. Tiling control is important for mobile, and
         Vulkan renderpasses provide an overall dependency graph to WebGPU.
         -

         More homework needed to know tiling control is a structural issue
         to figure out for the MVP or not.
         -

      Discussion around a new example synchronization use-case on the email
      thread
      -

         Discussion that Metal knows more about the hardware than WebGPU
         will, which makes implicit barriers easier to implement.
         -

         Should have the discussion in writing on email threads.

Tentative agenda

   -

   Administrative stuff (if any)


   -

   Individual design and prototype status


   -

   Use cases for synchronization
   -

   Agenda for next meeting

Attendance

   -

   Apple
   -

   Dean Jackson
   -

   Myles C. Maxfield
   -

   Theresa O'Connor
   -

   Google
   -

      Corentin Wallez
      -

      John Kessenich
      -

      Kai Ninomiya
      -

   Microsoft
   -

      Rafael Cintron
      -

   Mozilla
   -

      Dzmitry Malyshau
      -

      Jeff Gilbert
      -

   Yandex
   -

      Kirill Dmitrenko
      -

   ZSpace
   -

      Doug Twilleager
      -

   Elviss Strazdiņš
   -

   Joshua Groves


   -

   Markus Siglreithmaier

Administrative items

   -

   CW: Slot scheduled for meeting with WASM CG at TPAC
   -

      Tuesday Nov 7th. 2PM-3:30PM
      -

      Let Corentin know if you’re coming
      -

      DJ: No update on waiving registration fee to attend a 2 hour TPAC
      meeting.
      -

   DJ: Software license agreement
   -

      Basically just waiting on last signoff

Individual design and prototype status

   -

   Apple:
   -

      MM: Starting to implement an API that could look like what WebGPU
      would look like. Implementing a Vulkan backend first because it is the
      hardest backing API.
      -

      MM: Got us to understand some of other participant’s concerns.
      -

      MM: Regarding our WSL/HLSL implementation. Like last week we have
      pieces of a codegen phase from WSL / HLSL to SPIR-V. Have an idea of what
      to do to make our JS implementation accept HLSL.
      -

   Google:
   -

      CW: Started to read SPIR-V spec thoroughly to get the whole picture.
      Going to look at all undef behaviours and try to classify them. From last
      meeting about concerns about OpPhi
      -

      JK: There’ll be validation for that
      -

      CW: Demo of something that looks like WebGPU from NXT to show on TPAC
      -

   Microsoft:
   -

      RC: Got answer from D3D team about synchronization email.
      -

      RC: Also talked to lawyers about SPIR-V licensing
      -

   Mozilla:
   -

      DM: to CW: is last version of NXT public?
      -

      CW: Links will be in the maling list. NXT is in Google GH
      organization. Once legal stuff sorted out, all NXT code wiil be in WebGPU
      GH org.
      -

      DM: Got SPIR-V to MSL working, so SPIR-V to all modern backends
      working. Basic testing for regressions.
      -

      CW: Manually?
      -

      DM: Automation is planned, right now it’s manual.

Use cases for synchronization

   -

   RC: I haven’t the latest update. D3D team said that currently listed
   examples are fine, besides cases for tiled architectures (such as input
   attachments)
   -

   CW: Would be great if D3D team could look at our API designs and make
   sure they’ll work with future D3D tiler support
   -

      RC: Ok
      -

   CW: @MM: feedback on synchronization use cases?
   -

      MM: Was talking about how Vulkan pipelines require you to specify a
      renderpass
      -

   MM: Making pipeline state obj in Vulkan require a compatible render pass
   (i.e. compatible attachment formats). We were looking at this in case of a
   situation when you don’t know format of attachments before rendering
   started. (?).
   -

      MM: Missing piece is subpass state
      -

      CW: creating a pipeline before you have an encoder?
      -

      MM: Vulkan requires knowing all the subpasses and Metal doesn’t. You
      make a pipeline obj and it represent kinda subpass. In Vulkan everything
      needs to be created explicitly.
      -

      CW: Vulkan compile pipelines using knowledge about subpasses
      (important on tiled GPUs)
      -

      MM: In our ideal world where you don’t have to specify render subpass
      up front at pipeline creation, every renderpass has exactly one
subpass in
      it, so when you create pipeline state you know how to fill in
those fields
      in the Vulkan backend
      -

      CW: Vulkan style is ugly but if we don’t use it then we give up
      control over tile locality. Metal 2 adds more capabilities
related to tile
      control. D3D12 might add these too.
      -

      JG: Why should we give up performance on Vulkan and Metal 2 just
      because Metal doesn’t expose it?
      -

      MM: I’m arguing this isn’t the right API design to achieve that
      performance. We shouldn’t settle on Vulkan’s over Future-D3D or Metal 2.
      -

      JG: Understood. Concerns about using Vulkan style over Metal/2?
      -

      MM: Yes, developer experience.
      -

      MM: Proposal is one subpass per render pass.
      -

         MM: Only for MVP
         -

      DM: gain on Mobile from render sub-passes: Vulkan Game Development on
      Mobile <https://www.youtube.com/watch?v=y-EBiswp3qU>
      -

      CW: If we want to exclude this from the MVP, we need to make sure it
      won’t affect structure of the rest of the API.
      -

      JG: If D3D is going to get tile control then maybe we need to defer
      designing this until we can design with D3D in mind. But not
putting it in
      the MVP seems hazardous.
      -

      DJ: Arguing for cleaner API for MVP. Later decide how to go forward
      with tile control, including multiple render subpasses and explicit
      synchronization.
      -

      CW: Render target scope instead of multiple subpasses maybe okay?
      Your argument was that specifying subpasses is complicated.
Apple only has
      to deal with one tiled architecture in Metal right now. But Vulkan, and
      maybe Future-D3D, may need more info (e.g. pipelines’ subpasses)
      -

      JG: Making this explicit forces developer to think about and provide
      as much info as might be useful on tiled, even when developing on desktop
      at first.
      -

      DJ: Developer on a desktop GPU might never care about this at all.
      -

      JG
      -

      DJ: Basically we just disagree on whether we should make it simpler
      or more explicit.
      -

      CW: AMD, which is a desktop GPU, says render passes are useful there
      too https://gpuopen.com/vulkan-renderpasses/
      -

      DM: Makes sense to have explicit dependencies, e.g. on D3D12 we can
      generate barriers by analyzing the subpass dependencies. On Metal (2?)
      something (?)
      -

      RC: On Metal 2 and Vulkan, how different are the tile control
      stories? How tractable is it to intersect them?
      -

         MM: Two answers: In a model with one subpass per renderpass in
         vulkan, similar. If concerned with good tile control,
difficult and should
         wait for info on D3D.
         -

      CW: Ok, makes sense. Everyone should do homework and determine how
      much it affects structure so we can determine whether it’s okay
to exclude
      it from MVP.
      -

   CW: The use cases that came from Vulkan. Good to think about, but don’t
   inform the implicit vs explicit memory barrier debate we have. They seem to
   all work well on both explicit and implicit.
   -

      CW: But came up with a new use case that doesn’t work well on
      implicit.
      -

      CW: Anyone have topics wrt the original use cases list?
      -

      [No]
      -

   CW: Was trying to think of why Metal can be more successful than us.
   -

      CW: Metal knows whether the hardware implements the barrier for a
      single resource or globally.
      -

      CW: For WebGPU we don’t know, so we don’t know whether it’s better to
      have individual barriers per resource (allows driver to do better
      scheduling) or aggregate barriers into one big barrier (prevents excess
      global stalls)
      -

      MM: Seems same as your old argument from a few months ago.
      -

      MM: Vulkan has two places with sync: inside render subpass dep graph
      and inserting barriers. If pass has one subpass and command
buffer has one
      renderpass, that allows you to have optimal implicit barriers.
      -

      CW: If we’re looking to have tile control, there’ll be some complex
      mapping between passes.
      -

      CW: Even with your constraints we can’t do optimal barrier placement
      in D3D12/Vulkan.
      -

      MM: On API level if you look at commands, you know state of resources
      (prev and needed) and therefore you know exact set of barriers.
      -

      CW: That algorithm is nontrivial and is hardware dependent. API can’t
      see across separate queue submissions.
      -

      MM: On first point: the UA knows better about hardware than app
      developer.
      -

      CW: App knows what it’s going to do.
      -

      MM: You know for every pair of commands that interact with the
      resource, you know the old and new state, so you can schedule
the barriers
      as needed.
      -

      CW: I argue that the algorithm is possible but nontrivial.
      -

      RC: Can you give an example of the application doing better than the
      API?
      -

      CW: You can’t look across queue submits (and that makes it
      complicated), so you might need to do pessimistic barrier placement. In
      application you can (given domain knowledge) you can group barriers (e.g.
      different buffers all become UBOs)
      -

      CW: Difficult spoken conversation.
      -

      MM: Agree. Question: Why would anybody submit more than one queue per
      frame? Each frame already has a full flush anyway so one queue
per frame is
      okay.
      -

         CW: Cache flush?
         -

         MM: Yes.
         -

         CW: Don’t necessarily agree - hardware dependent. Reason for
         multiple flushes: e.g. multiple queues, e.g. VR is latency sensitive.
         -

         CW: Let’s continue in writing so we can think about it more in
         depth.

Agenda for next meeting

   -

   Shading languages!
   -

      CW: Undef behaviours in SPIR-V
      -

      CW: Others?
      -

      MM: Could talk briefly about our “trap” idea.
      -

      MM: Working on emitting SPIR-V to prove equivalence, working on
      moving to HLSL.
      -

      MM: But we have nothing that’s blocking the shading language
      discussion.
      -

      DJ: high- vs low-level debate?
      -

         CW: SL vs IR, text vs binary, etc.
         -

         CW: Schedule at the end, will take forever.
Received on Monday, 23 October 2017 18:04:26 UTC