- From: Corentin Wallez <cwallez@google.com>
- Date: Thu, 31 Aug 2017 15:55:53 -0400
- To: public-gpu <public-gpu@w3.org>
- Message-ID: <CAGdfWNMKh5qsnA5uuWoQAkGDZ0VtidOVHv1=s2XWLy_+ziuAbw@mail.gmail.com>
GPU Web 2017-08-30
Chair: Corentin
Scribe: Dean, Kai, Corentin
Location: Google Hangout
Minutes from last meeting
<https://docs.google.com/document/d/1hIqo26_hFy-ZMSoVEAib3EJ5Azir4zGSJBUwG6asYfI/>
TL:DR
-
Apple is prototyping a shading language in the WebKit repo to inform
discussions
-
Clarifications on SPIR-V logical addressing mode and pointer variable
extension.
-
Clarifications on Metal memory barriers and fences:
-
MTLFence are essentially at encoder granularity
-
No implicit barriers in render encoders. Only between encoders
-
Implicit barriers happen in compute encoders
-
Apple showed how implicit memory barriers could work by splitting each
“encoder” in its own command buffer, and interleaving them with command
buffers containing barriers.
-
Non-trivial interaction with multi-queue
-
Discussion of how developers are happy with Metal without explicit
control
-
Need to figure out which level of abstraction to expose to the Web while
still retaining performance
-
Higher-level API vs. low-level API with helper libraries for implicit
barriers
-
WASM designed without CPU barriers leaving design open to add them
later
Tentative agenda
-
Administrative stuff (if any)
-
Individual design and prototype status
-
Memory barriers
-
Consensus on consensus
-
Agenda for next meeting
Attendance
-
Dean Jackson (Apple)
-
JF Bastien (Apple)
-
Myles C. Maxfield (Apple)
-
Theresa O'Connor (Apple)
-
Warren Moore (Apple)
-
Corentin Wallez (Google)
-
John Kessenich (Google)
-
Kai Ninomiya (Google)
-
Chas Boyd (Microsoft)
-
Rafael Cintron (Microsoft)
-
Dzmitry Malyshau (Mozilla)
-
Jeff Gilbert (Mozilla)
-
Alex Kluge (Vizit Solutions)
-
Doug Twilleager (ZSpace)
-
Elviss Strazdiņš
Administrative items
-
CW: Please register for the F2F
-
DJ: For TPAC Monday-Friday is a meeting, Wed is for 1h sessions to give
demos of stuff. Could give a WebGPU demo. Will talk with the chair of WASM
to be sure we can meet at some point Thurs-Friday
-
JF: Please sync with Brad Nelson
-
CW: Still no news from lawyers.
-
DJ: Will talk to them.
Individual design and prototype status
-
Apple:
-
MM: Started implementing a POC of a shading language in WebKit repo.
It is human readable, statically typed, generics, designed from
the ground
up to the same security guarantees as Java or ML, C-like.
-
CW: Do you think you’ll have anything to show next week?
-
MM: We don’t have much now. But we’ll see what we have.
-
Google:
-
CW: We have done some refactoring in NXT, but that’s all.
-
Microsoft:
-
No updates, may be doing some NXT D3D12 backend contributions
-
Mozilla:
-
Keep moving to make a prototype in Servo, no result yet, takes time
because explicit API.
Logical addressing mode in SPIR-V
-
MM: Did someone suggest using SPIR-V in Logical Addressing Mode?
-
CW: That was my suggestion to get discussion started. My original
thinking was that SPIR-V without any extension might be enough, but we
haven’t done any detailed investigation.
-
MM: Actually, I was talking about what *your* implementation will use.
Would SPIR-V with LAM be enough in your implementation?
-
CW: Yes, for Vulkan then that is what we need to use.
-
MM: If we required extensions to SPIR-V that don’t have (m)any
implementations IRL, would this be a failure?
-
CW: Yes, because there are devices that won’t have the extensions, and
we want to run on as many as possible (say 80% of Vulkan devices). Check
out vulkan.gpuinfo.org to see what the stats of Vulkan devices are.
-
JK: Vulkan specifies a lowest common denominator, there was also the
idea of a Vulkan “profile” that specifies minimum requirements.
-
DJ: Currently cannot compile OpenCL to Vulkan without SPIR-V extensions.
-
CW: OpenCL 1.2 can be mostly translated to SPIR-V + variable pointer
extension, not sure how many drivers will expose it.
-
JK: Seems like most HW could support the extension, most Android devices
will eventually support it, no timeline though.
-
JK: Still logical addressing as in pointers are abstract types that you
can’t cast to bits.
-
JK: OpenCL accepts spirv without logical addressing mode.
-
MM: If we want more expressive shading language, variable pointer is our
best bet?
-
JK: That extension is an incremental improvement, a relaxation of some
of the rules. Quite a bit more can be done even with abstract pointer types.
-
JG: Still need to target existing hardware. Afraid we are getting ahead
of ourselves, would like to see something that cannot be done in logical
addressing.
Memory barriers
-
CW: Last conversation was about explicit v implicit memory barriers,
that Metal doesn’t have explicit, and that a developer may get better
performance by being explicit.
-
MM: New information about Metal:
-
Metal fences in a render encoder (signal wait synchronization), the
signaling can be pushed to the end, and the wait to the beginning. These
facilities cannot be used to provide synchronization in the
render encoder.
-
CW: I talked with someone from Metal at SIGGRAPH. Because there is a
tiling GPU, there is per-pixel defined ordering. Because you can’t expect
an operation within an encoder to be able to read from something in a
previous operation.
-
MM: So any synchronization that WebGPU does (explicitly or implicitly)
must occur between successive encoders.
-
CW: It’s not very restrictive to ask for this in Vulkan too, because you
can push them to the start of the pass. In D3D12 you can put barriers
wherever because there isn’t a notion of passes or subpasses.
-
MM: So synchronization has to occur at specified checkpoints, makes the
problem of making implicit memory barriers.
-
CW: Two things - compute encoders might want a barrier to the same
compute encoder - I’m not sure what Metal requires.
-
MM: Metal fences can be used with render encoders and compute encoders.
Both types of encoders have the same semantics. The difference is that
Metal won’t issue barriers between draw commands in a render encoder, but
it will in a compute encoder.
-
CW: So we can’t take into account that sync has to happen at the
boundary for a compute encoder?
-
MM: True
-
CW: The other point was that implicit memory barriers means you have to
serialize in order to issue barriers. Problem 1: sync point for processing.
Problem 2: might introduce latency for VR workflows.
-
MM: Fixups happens for rendering at the encoder level. So the driver is
free to split encoders and leave holes, then go back and fill these holes.
-
CW: Don’t think other APIs let us back-patch like this.
-
MM: Can with command buffers, because they can be created out of order.
Fill the command buffers, look at the synchronization.
-
CW: vulkan barriers require both “from” and “to” but “from” is not known
until queue submit time
-
MM: been talking to teams that use metal and interact with metal
developers - if a render needs to happen earlier it gets put in a separate
encoder that gets sent to the queue earlier
-
CB: Is it the application or the driver that does this?
-
MM: Both do this technique: applications do it if they want to have work
submitted earlier and driver does it for HW sync.
-
CW: in metal, each part can be encoded separately and then submitted in
the right order; in others, barriers can be added when needed. DM, WDYT?
-
DM: Would be interesting to experiment with. Not sure we can create
extra command buffers then what the application gives us. Not sure how this
will allow transitions between queues of the same family that transition
the same resource to different layouts. There isn’t a single place where we
can track these transitions.
-
MM: Haven’t really decided on the story on multiple queues of the same
family.
-
CW: Let’s not go off into talking about queues yet. What other issues
are there with barriers? Currently we need to defer creating barriers so
that we can compute the “from” part of the barrier.
-
MM: [KN: I missed this. Asking about restrictions on barriers in other
apis?]
-
CW: Can put barriers wherever you want but might cause UB.
-
MM: If something works for render passes, it should also work for
compute?
-
CB: Sounds sane, there is no distinction on D3D12. Do you allocate
thread or do you allocate pixels and hope differences will show up.
-
CB: Will it be acceptable for web developers to work out these issues of
synchronization? The first thing most game developer do is a layer to track
synchronization so they don’t have to care about it anymore.
-
JG: Would like it to be a JS library.
-
CB: Could we converge to simplify work for this layer by having a common
thing?
-
MM: Just showed a model where the programmer doesn’t have to specify
anything.
-
JG: But at what cost?
-
MM: Metal is successful so acceptable cost.
-
CB: Most game developers work at the abstraction level of Metal.
-
JG: Don’t expect average Webdev to learn WebGPU same as for WebGL
developers.
-
DJ: Metal made the choice to shield developers. Still works for AAA
games. Don’t know why Vulkan chose this way.
-
CB: There is great performance to be had, but only a few developers can
take advantage of it. They are the same people that will target each native
API directly. Webdevs would tolerate much higher level of abstraction.
-
JG: Did Vulkan / D3D12 make a mistake there?
-
JFB: for WebAssembly we decided to not have barriers like C++. You can
lose some performance, but it is only for experts. WASM has zero barriers
but it is good enough to have performance, and we can add barriers later in
the design we chose, if experts prove that they need to barriers for
performance. [Not to say this applies to GPU, but the point is that] I
think we can have no barriers first then see if it works. Then experts can
rant about lack of barriers. Leave the design open so we can add barriers
later, if needed. Metal has shown that you don’t really need them, at least
as a first step. Maybe Vulkan is right and you need it, but you can add
them later.
-
JG: One of the problem here is that people who target Metal is: only one
platform supports multiple low level APIs (Windows with Vulkan/D3D12).
There’s no direct comparison between Metal and an API with explicit
barriers. Since there’s no choice of API on Mac we don’t know that Metal’s
choice was “better”.
-
CW: One thing also is that native have either barriers and multiple
queues or none-of them. Is it even possible to have multiple queues without
barriers.
-
MM: Metal has different queues, just not sync between different queues
-
BC: Barriers don’t really have anything to do with GPU synchronization
(fences, etc). Barriers usually affect texture layout, cache locality, etc.
Nothing to do with multiple queues. AFAIK Vulkan uses similar nomenclature
to distinguish these concepts.
-
DM: Connection between queues and barriers is from automatic of resource
transition because multiple queues can be independently executing and
operating on the same resource. Tracking gets more complicated.
-
MM: So let’s not have multiple queues! (not joke)
-
MM: If there are multiple queues, then Metal has to virtualize the
queues onto the same hardware, which leads to deadlocks
-
CW: Something about Vulkan spec wanting forward progress.
-
DM: I want to experiment with a standalone implementation of what Apple
has described. Basically translating from Metal style to Vulkan style - can
directly compare performance with manual optimization.
-
JG: Should expose raw platforms instead of thinking “this isn’t needed
in most cases” “that ...”.
-
DM: Can the simplification be done in user space?
-
CB: Less chance for consistent behavior between browsers.
-
MM: Argument isn’t that automatic barriers can always be done as well as
hand-made barriers.
-
JG: Preference design wise is to have hand barriers then make a
userspace layer for automatic barriers.
Consensus on consensusAgenda for next meeting
-
CW: After shading languages, what should we do?
-
JG: Don’t think we’re going to finish shading languages
-
CW: Still don’t want to talk about all shading languages all the time
to respect Chas’s David’s John’s etc’s time. Let’s alternate with other
stuff.
-
JG: Maybe every other week?
-
CW: OK. So, more memory barriers? Consensus on consensus? Something
else?
-
MM: How about not memory barriers.
Agenda for next week:
-
More shading languages.
Agenda for in two weeks:
-
Consensus on consensus
-
Open questions on the pipeline objects.
Received on Thursday, 31 August 2017 19:56:38 UTC