Minutes for the 2017-08-30 meeting from Corentin Wallez on 2017-08-31 (public-gpu@w3.org from August 2017)

From: Corentin Wallez <cwallez@google.com>
Date: Thu, 31 Aug 2017 15:55:53 -0400
To: public-gpu <public-gpu@w3.org>
Message-ID: <CAGdfWNMKh5qsnA5uuWoQAkGDZ0VtidOVHv1=s2XWLy_+ziuAbw@mail.gmail.com>
GPU Web 2017-08-30

Chair: Corentin

Scribe: Dean, Kai, Corentin

Location: Google Hangout
Minutes from last meeting
<https://docs.google.com/document/d/1hIqo26_hFy-ZMSoVEAib3EJ5Azir4zGSJBUwG6asYfI/>
TL:DR

   -

   Apple is prototyping a shading language in the WebKit repo to inform
   discussions
   -

   Clarifications on SPIR-V logical addressing mode and pointer variable
   extension.
   -

   Clarifications on Metal memory barriers and fences:
   -

      MTLFence are essentially at encoder granularity
      -

      No implicit barriers in render encoders. Only between encoders
      -

      Implicit barriers happen in compute encoders
      -

   Apple showed how implicit memory barriers could work by splitting each
   “encoder” in its own command buffer, and interleaving them with command
   buffers containing barriers.
   -

      Non-trivial interaction with multi-queue
      -

      Discussion of how developers are happy with Metal without explicit
      control
      -

   Need to figure out which level of abstraction to expose to the Web while
   still retaining performance
   -

      Higher-level API vs. low-level API with helper libraries for implicit
      barriers
      -

      WASM designed without CPU barriers leaving design open to add them
      later

Tentative agenda

   -

   Administrative stuff (if any)


   -

   Individual design and prototype status


   -

   Memory barriers
   -

   Consensus on consensus
   -

   Agenda for next meeting

Attendance

   -

   Dean Jackson (Apple)
   -

   JF Bastien (Apple)
   -

   Myles C. Maxfield (Apple)
   -

   Theresa O'Connor (Apple)
   -

   Warren Moore (Apple)
   -

   Corentin Wallez (Google)
   -

   John Kessenich (Google)
   -

   Kai Ninomiya (Google)
   -

   Chas Boyd (Microsoft)
   -

   Rafael Cintron (Microsoft)
   -

   Dzmitry Malyshau (Mozilla)
   -

   Jeff Gilbert (Mozilla)
   -

   Alex Kluge (Vizit Solutions)
   -

   Doug Twilleager (ZSpace)
   -

   Elviss Strazdiņš

Administrative items

   -

   CW: Please register for the F2F
   -

   DJ: For TPAC Monday-Friday is a meeting, Wed is for 1h sessions to give
   demos of stuff. Could give a WebGPU demo. Will talk with the chair of WASM
   to be sure we can meet at some point Thurs-Friday
   -

      JF: Please sync with Brad Nelson
      -

   CW: Still no news from lawyers.
   -

      DJ: Will talk to them.

Individual design and prototype status

   -

   Apple:
   -

      MM: Started implementing a POC of a shading language in WebKit repo.
      It is human readable, statically typed, generics, designed from
the ground
      up to the same security guarantees as Java or ML, C-like.
      -

      CW: Do you think you’ll have anything to show next week?
      -

      MM: We don’t have much now. But we’ll see what we have.
      -

   Google:
   -

      CW: We have done some refactoring in NXT, but that’s all.
      -

   Microsoft:
   -

      No updates, may be doing some NXT D3D12 backend contributions
      -

   Mozilla:
   -

      Keep moving to make a prototype in Servo, no result yet, takes time
      because explicit API.

Logical addressing mode in SPIR-V

   -

   MM: Did someone suggest using SPIR-V in Logical Addressing Mode?
   -

   CW: That was my suggestion to get discussion started. My original
   thinking was that SPIR-V without any extension might be enough, but we
   haven’t done any detailed investigation.
   -

   MM: Actually, I was talking about what *your* implementation will use.
   Would SPIR-V with LAM be enough in your implementation?
   -

   CW: Yes, for Vulkan then that is what we need to use.
   -

   MM: If we required extensions to SPIR-V that don’t have (m)any
   implementations IRL, would this be a failure?
   -

   CW: Yes, because there are devices that won’t have the extensions, and
   we want to run on as many as possible (say 80% of Vulkan devices). Check
   out vulkan.gpuinfo.org to see what the stats of Vulkan devices are.
   -

   JK: Vulkan specifies a lowest common denominator, there was also the
   idea of a Vulkan “profile” that specifies minimum requirements.
   -

   DJ: Currently cannot compile OpenCL to Vulkan without SPIR-V extensions.
   -

   CW: OpenCL 1.2 can be mostly translated to SPIR-V + variable pointer
   extension, not sure how many drivers will expose it.
   -

   JK: Seems like most HW could support the extension, most Android devices
   will eventually support it, no timeline though.
   -

   JK: Still logical addressing as in pointers are abstract types that you
   can’t cast to bits.
   -

   JK: OpenCL accepts spirv without logical addressing mode.
   -

   MM: If we want more expressive shading language, variable pointer is our
   best bet?
   -

   JK: That extension is an incremental improvement, a relaxation of some
   of the rules. Quite a bit more can be done even with abstract pointer types.
   -

   JG: Still need to target existing hardware. Afraid we are getting ahead
   of ourselves, would like to see something that cannot be done in logical
   addressing.

Memory barriers

   -

   CW: Last conversation was about explicit v implicit memory barriers,
   that Metal doesn’t have explicit, and that a developer may get better
   performance by being explicit.
   -

   MM: New information about Metal:
   -

      Metal fences in a render encoder (signal wait synchronization), the
      signaling can be pushed to the end, and the wait to the beginning. These
      facilities cannot be used to provide synchronization in the
render encoder.
      -

   CW: I talked with someone from Metal at SIGGRAPH. Because there is a
   tiling GPU, there is per-pixel defined ordering. Because you can’t expect
   an operation within an encoder to be able to read from something in a
   previous operation.
   -

   MM: So any synchronization that WebGPU does (explicitly or implicitly)
   must occur between successive encoders.
   -

   CW: It’s not very restrictive to ask for this in Vulkan too, because you
   can push them to the start of the pass. In D3D12 you can put barriers
   wherever because there isn’t a notion of passes or subpasses.
   -

   MM: So synchronization has to occur at specified checkpoints, makes the
   problem of making implicit memory barriers.
   -

   CW: Two things - compute encoders might want a barrier to the same
   compute encoder - I’m not sure what Metal requires.
   -

   MM: Metal fences can be used with render encoders and compute encoders.
   Both types of encoders have the same semantics. The difference is that
   Metal won’t issue barriers between draw commands in a render encoder, but
   it will in a compute encoder.
   -

   CW: So we can’t take into account that sync has to happen at the
   boundary for a compute encoder?
   -

   MM: True
   -

   CW: The other point was that implicit memory barriers means you have to
   serialize in order to issue barriers. Problem 1: sync point for processing.
   Problem 2: might introduce latency for VR workflows.
   -

   MM: Fixups happens for rendering at the encoder level. So the driver is
   free to split encoders and leave holes, then go back and fill these holes.
   -

   CW: Don’t think other APIs let us back-patch like this.
   -

   MM: Can with command buffers, because they can be created out of order.
   Fill the command buffers, look at the synchronization.
   -

   CW: vulkan barriers require both “from” and “to” but “from” is not known
   until queue submit time
   -

   MM: been talking to teams that use metal and interact with metal
   developers - if a render needs to happen earlier it gets put in a separate
   encoder that gets sent to the queue earlier
   -

   CB: Is it the application or the driver that does this?
   -

   MM: Both do this technique: applications do it if they want to have work
   submitted earlier and driver does it for HW sync.
   -

   CW: in metal, each part can be encoded separately and then submitted in
   the right order; in others, barriers can be added when needed. DM, WDYT?
   -

   DM: Would be interesting to experiment with. Not sure we can create
   extra command buffers then what the application gives us. Not sure how this
   will allow transitions between queues of the same family that transition
   the same resource to different layouts. There isn’t a single place where we
   can track these transitions.
   -

   MM: Haven’t really decided on the story on multiple queues of the same
   family.
   -

   CW: Let’s not go off into talking about queues yet. What other issues
   are there with barriers? Currently we need to defer creating barriers so
   that we can compute the “from” part of the barrier.
   -

   MM: [KN: I missed this. Asking about restrictions on barriers in other
   apis?]
   -

   CW: Can put barriers wherever you want but might cause UB.
   -

   MM: If something works for render passes, it should also work for
   compute?
   -

   CB: Sounds sane, there is no distinction on D3D12. Do you allocate
   thread or do you allocate pixels and hope differences will show up.
   -

   CB: Will it be acceptable for web developers to work out these issues of
   synchronization? The first thing most game developer do is a layer to track
   synchronization so they don’t have to care about it anymore.
   -

   JG: Would like it to be a JS library.
   -

   CB: Could we converge to simplify work for this layer by having a common
   thing?
   -

   MM: Just showed a model where the programmer doesn’t have to specify
   anything.
   -

   JG: But at what cost?
   -

   MM: Metal is successful so acceptable cost.
   -

   CB: Most game developers work at the abstraction level of Metal.
   -

   JG: Don’t expect average Webdev to learn WebGPU same as for WebGL
   developers.
   -

   DJ: Metal made the choice to shield developers. Still works for AAA
   games. Don’t know why Vulkan chose this way.
   -

   CB: There is great performance to be had, but only a few developers can
   take advantage of it. They are the same people that will target each native
   API directly. Webdevs would tolerate much higher level of abstraction.
   -

   JG: Did Vulkan / D3D12 make a mistake there?
   -

   JFB: for WebAssembly we decided to not have barriers like C++. You can
   lose some performance, but it is only for experts. WASM has zero barriers
   but it is good enough to have performance, and we can add barriers later in
   the design we chose, if experts prove that they need to barriers for
   performance. [Not to say this applies to GPU, but the point is that] I
   think we can have no barriers first then see if it works. Then experts can
   rant about lack of barriers. Leave the design open so we can add barriers
   later, if needed. Metal has shown that you don’t really need them, at least
   as a first step. Maybe Vulkan is right and you need it, but you can add
   them later.
   -

   JG: One of the problem here is that people who target Metal is: only one
   platform supports multiple low level APIs (Windows with Vulkan/D3D12).
   There’s no direct comparison between Metal and an API with explicit
   barriers. Since there’s no choice of API on Mac we don’t know that Metal’s
   choice was “better”.
   -

   CW: One thing also is that native have either barriers and multiple
   queues or none-of them. Is it even possible to have multiple queues without
   barriers.
   -

   MM: Metal has different queues, just not sync between different queues
   -

   BC: Barriers don’t really have anything to do with GPU synchronization
   (fences, etc). Barriers usually affect texture layout, cache locality, etc.
   Nothing to do with multiple queues. AFAIK Vulkan uses similar nomenclature
   to distinguish these concepts.
   -

   DM: Connection between queues and barriers is from automatic of resource
   transition because multiple queues can be independently executing and
   operating on the same resource. Tracking gets more complicated.
   -

   MM: So let’s not have multiple queues! (not joke)
   -

   MM: If there are multiple queues, then Metal has to virtualize the
   queues onto the same hardware, which leads to deadlocks
   -

   CW: Something about Vulkan spec wanting forward progress.
   -

   DM: I want to experiment with a standalone implementation of what Apple
   has described. Basically translating from Metal style to Vulkan style - can
   directly compare performance with manual optimization.
   -

   JG: Should expose raw platforms instead of thinking “this isn’t needed
   in most cases” “that ...”.
   -

   DM: Can the simplification be done in user space?
   -

   CB: Less chance for consistent behavior between browsers.
   -

   MM: Argument isn’t that automatic barriers can always be done as well as
   hand-made barriers.
   -

   JG: Preference design wise is to have hand barriers then make a
   userspace layer for automatic barriers.

Consensus on consensusAgenda for next meeting

   -

   CW: After shading languages, what should we do?
   -

      JG: Don’t think we’re going to finish shading languages
      -

      CW: Still don’t want to talk about all shading languages all the time
      to respect Chas’s David’s John’s etc’s time. Let’s alternate with other
      stuff.
      -

      JG: Maybe every other week?
      -

      CW: OK. So, more memory barriers? Consensus on consensus? Something
      else?
      -

      MM: How about not memory barriers.


Agenda for next week:

   -

   More shading languages.


Agenda for in two weeks:

   -

   Consensus on consensus
   -

   Open questions on the pipeline objects.
Received on Thursday, 31 August 2017 19:56:38 UTC