- From: Corentin Wallez <cwallez@google.com>
- Date: Fri, 15 Sep 2017 15:25:13 -0400
- To: public-gpu <public-gpu@w3.org>
- Message-ID: <CAGdfWNMk43LTFF4J1PjfM-ZvvBPL6_p7TFCVrqSM8pnkXWybwg@mail.gmail.com>
GPU Web 2017-09-13
Chair: Corentin
Scribe: Ken
Location: Google Hangout
Minutes from last meeting
<https://docs.google.com/document/d/1T_bbzKC22BAq3ax_cnm_kcL-KJSG4nM5sH4C9egIUfk/>
TL;DR
-
Open questions that do not affect the structure of the API can be
deferred post-MVP
-
Pipeline Objects
-
Rasterizer sample count and depth bounds deferred post-MVP
-
Pipeline caching should be deferred post-MVP. This will allow us to
understand better what are the usage patterns and make correct
decision to
design the API.
-
Primitive restart: Ben confirmed that 0xFFFFFFFF should not enable
the sentinel value on int16 indices on D3D12. Consensus to put
index format
in pipeline state unless better solution.
-
Explicit boolean for depth testing: concern that implicit might
interact badly with future extension. Discussion deferred.
-
Consensus on consensus
-
Consensus that compute should be an MVP feature.
-
Supporting multiple queues in MVP is an open question (simpler but
concern it has structural implications)
-
State inheritance between renderpasses / subpasses was marked as
having consensus but things aren’t 100% clear.
-
Lot of consensus confirmed.
Tentative agenda
-
Administrative stuff (if any)
-
Individual design and prototype status
-
Consensus on consensus
-
Open questions on pipeline objects
-
Agenda for next meeting
Attendance
-
Apple
-
Dean Jackson
-
Myles C. Maxfield
-
Theresa O'Connor
-
Warren Moore
-
Google
-
Corentin Wallez
-
John Kessenich
-
Kai Ninomiya
-
Ken Russell
-
Zhenyao Mo
-
Microsoft
-
Ben Constable
-
Chas Boyd
-
Frank Olivier
-
Rafael Cintron
-
Mozilla
-
Dzmitry Malyshau
-
Jeff Gilbert
-
ZSpace
-
Doug Twilleager
-
Joshua Groves
Administrative items
-
CW: At the end of the meeting let’s talk about what
-
DJ: do we have a doc for agenda items?
-
CW: yes, will link it. LINK
-
DJ has been pinging lawyers about license discussion; no response yet;
last email was that they were talking with Google and Microsoft
-
CW hasn’t asked in a while
-
RC also no news
Individual design and prototype status
-
Apple
-
MM: nothing interesting. Continuing development on shading language
prototype
-
BC: any highlights you’d like to share?
-
MM: think yes, but want to save it for a shading language discussion
-
Google
-
CW: Not much to report
-
Microsoft
-
BC: haven’t worked on much code. Working through discussions and
design issues. Making good progress we think.
-
Mozilla:
-
DM: making progress on OpenGL backend for their graphics abstraction
-
Prototyping inside Servo: can render triangles with graphics
pipelines with decent framerate given that we’re reading back
the buffer
-
Servo prototype: https://github.com/kvark/webgpu-servo
Pipeline objects
-
CW: had an email thread: there were some comments that added questions
on the pull request
-
CW: had consensus on:
-
CW: Not exposing the depth bounds feature of D3D12 and Vulkan because it
doesn’t exist on Metal (?)
-
DM: did we really have consensus on this? Would like this to be
exposed under a feature gate post-MVP
-
MM: agree to include it post-MVP
-
CW: would make sense for it to be exposed as an extension. Can we
agree to not put this in MVP?
-
DM: OK.
-
CW: can not have different rasterizer sample count from the render
target’s sample count
-
DM: why not require specifying this during pipeline creation?
-
CW: hard to do this from render pass
-
DM: render pass only says your images have this number of samples.
Doesn’t say how the rasterizer will work on them. Ultimately want
rasterizer to support different frequency than number of
samples. If we add
this capability now then we don’t need to change the API later
-
CW: what’s the point of having a rasterizer sample count different
from the texture’s sample count?
-
DM: imagine you’re rendering into non-MSAA textures. Rendering with
sample count=16. Your shader can see the sample mask and …
-
BC: recurrent theme here: should be a stated goal that the MVP API is
allowed to change later – necessary scaffolding, but don’t want
to restrict
building features later (2 have come up in the last 5 minutes).
Also don’t
want to forbid changing the API. Have to build and test the thing before
you can design the API surface. Want us to not fear changing the MVP, but
rather get to MVP quickly. Unless, there’s a very definitive proof that
it’ll change the structure. Some features are like that and require
changing many API points.
-
DM: coming from a standpoint that 2 of the 3 APIs require specifying
that, but agree with deferring it post MVP.
-
MM: didn’t understand what you’re (BC) saying. Should we expect to
make breaking API changes post-MVP?
-
BC: mental model of MVP: MVP is not version 1.0, but version 0.8 or
0.9. It’s something we build to figure out what we want in 1.0. It’s a
beachhead, not winning the war. Want MVP to not be something we have to
support forever. Expect that we might need to change the pipeline state
object in some way. Think users should expect to have to change
their code
if they code to the MVP.
-
DJ: agree with Ben.
-
CW: for features where it’s not a structural issue, but one member in
a structure or one function later, it’s easy to add it later without
problems.
-
CW: do people agree we should defer depth bounds and rasterizer sample
masks to post-MVP?
-
DM, BC: yes.
-
CW: Vulkan and D3D12 have different ways to cache things
-
Have either one sort of pipeline derivation – “this pipeline looks
like this other one”
-
Or, let the browser do everything
-
JG: you mean, browser caches things implicitly?
-
RC: preference would be the latter. Maybe MVP is the browser redoes
everything from scratch. Later, cache smartly for you. If
infeasible, give
developer the knobs they need to do it themselves.
-
DJ + CW: agree
-
BC: building up a feature for AAA games for this, it took a few
iterations to get it right. Think we’ll need the MVP and a few workloads
running to get this right. Think we should defer it. This is one of the
reasons to get the MVP running faster.
-
JG: agree. This is a solvable thing but kicking it down the road is
fine, and preferred. State that users shouldn’t expect that
shaders will be
cached by the MVP.
-
MM: Question for Microsoft: MSDN has some information about the
cached PSO. Does that work like derivative pipelines, or does it
require an
exactly equal pipeline.
-
BC: that’s one of the many corners of the API that can’t answer
directly. Can get an answer quickly.
-
CW: pretty sure it requires the same pipeline. Saw it somewhere in
the documentation. “The rest of the data in the PSO still needs to be
valid and match the cached PSO or an error is returned
<https://msdn.microsoft.com/en-us/library/windows/desktop/dn914407(v=vs.85).aspx>
.”
-
CW: sounds like there is agreement that caching will be prototyped
and done after MVP.
-
CW: primitive restart
-
Metal always enables primitive restart. No way to disable it.
Otherwise you need to parse the index buffer and validate against the
primitive restart index. Seems we have to enable primitive restart on all
APIs. Ben on the mailing list explained that drivers should be looking at
this for the index buffer format.
-
Encode the index buffer format in the pipeline state? How do people
feel about this?
-
DM: would like to defer decision until we get more information from
Ben. DM provided a test case to Ben, using MSFT-issued driver WHQL
certified, and behavior is that 32-bit cut index works for 16-bit buffer.
If this works then we don’t need to provide the index buffer type.
-
MM: what about 32bit inde buffer with a sentinel value of 0xFFFF?
-
JG: it’s just a value. In WebGL 2.0 we force enabled primitive
restart because it can’t be disabled in D3D11.
-
BC: will investigate Dzmitry’s test case. Spec is that it doesn’t
work to do this. At the time you provide this you’re also
providing lots of
details about your index buffer. Will follow up on why it’s not
behaving as
specified.
-
KR: Myles mentioned something about the stencil buffer / stencil
mask. In WebGL we are making a rule that the sample mask only the stencil
bits used in the FBO are what’s used. This is a behavior change that’s
important for portability. The primitive index may also be being
masked to
the number of bits in the indices.
-
BC: Fairly accurate data, the test checks that 0xFFFFFFFF should not
work for int16 index buffers. The driver Dzmitry tested on
should not have
passed this test.
-
BC: Having the pipeline state having the type of the index buffer
seemed like an easy solution. Why do people think it isn’t a
great solution?
-
DM: concerned there will be clients who defer the index buffer
binding and the type of the index buffer until later. Not sure everyone
will know it at pipeline creation time. But if MSFT requires it in D3D12,
then let’s require it.
-
BC: the spec does say this. Will also investigate your test case.
Also have to know the vertex buffer format up front, and that’s even more
complex than the index buffer format.
-
JG: this is not something that we, as the WebGPU driver, can infer.
-
BC: no. In the pipeline state, here are your shaders and root
descriptor, and here are your vertices.
-
JG: index buffer width.
-
CW: that’s known when you bind the index buffer.
-
JG: and you can bind multiple different ones with the same pipeline
state object?
-
CW: yes.
-
CW: agreement to put index buffer format in pipeline state?
-
JG: yes, but it’s just a new restriction. But if you ever had a
situation where you mostly use U16s and then upgrade to U32s later, you
have to create a new pipeline state object later. Maybe not so bad.
-
CW: with caching it should be free.
-
BC: in this case, if you’d written your engine in D3D or Vulkan, when
you upgrade from 16 bit to 32 bit you’d also change this value
and create a
new PSO.
-
JG: unless you’re not using primitive restart. Or skip using that
index.
-
BC: ...yes...could do that. But this is a small thing to cover all 3
APIs. This seems like a small mental cost to pay to get fast
performance on
all 3 APIs.
-
JG: just trying to make sure we understand even the little things
we’re leaving behind
-
CW: wacky idea of putting vertex buffers inside descriptor sets/bind
groups
-
Seems nobody likes that, so let’s not do that
-
Agreed
-
CW: should we explicitly enable independent blending and depth testing?
-
The two are different
-
Independent blending: could figure out that you’re not doing it.
-
Depth testing: if the comparison function is always true, it’s the
same as not enabling depth testing.
-
DM: mentioned we could use nullable IDL property to handle this. But
it turns out it’s not possible. Can’t have a nullable dictionary
in another
dictionary. Also implies we should do what Apple proposed, and derive
whether the feature is enabled based on the values provided by the user.
Both for independent blends and depth testing.
-
BC: to clarify: want to infer that it’s turned off because they used
ALWAYS with no writing?
-
JG: yes. Then if you’re always passing you can disable the depth test.
-
BC: concerned the semantics are different than what we’re telling the
driver
-
JG: agree...but can’t think of a way it’ll be wrong
-
BC: concern from resident spec expert that as other values get added
to the depth testing, then the difference between ALWAYS and
turning depth
testing on/off will be more apparent, and then you’re stuck. Having a
separate bool enable seems more future-proof. They’re different in the
hardware.
-
JG: counter-proposal: if we extend the depth test options, we could
add back in the separate disabling of the depth test.
-
BC: when implementing this in NXT, felt like the fact that this
didn’t map to how the hardware was talking about it made it hard to build
it and hard to test it properly. Felt I was fighting it. Not a scientific
measure.
-
BC: however, someone doing the Metal backend would have to do the
reverse. It’s sort of a 2 out of 3 vote based on the structure of the
low-level APIs.
-
CW: seems like a more difficult topic than expected. Let’s flesh this
out more on the mailing list.
Consensus on consensus
Link to Myles’ document. <https://github.com/gpuweb/gpuweb/wiki/Roadmap>
Mailing list thread.
<https://lists.w3.org/Archives/Public/public-gpu/2017Aug/0000.html>
-
CW: high-level object model. There should be at least one type of queue
that can do everything. Everyone agrees?
-
DM: what if the hardware doesn’t support compute?
-
CW: then doesn’t support GPUWeb. Vulkan requires one queue to support
everything. D3D12 supports both.
-
MM: skipped whether we’re going to support compute.
-
Think we need compute.
-
CW: agree.
-
DM: think we need async compute.
-
CW: we should at least have compute.
-
BC: agree we should at least have compute.
-
CW: this is no longer an open question.
-
CW: MVP will only allow one instance of Queue, ever, per Device
(instantiation of WebGPU) to simplify the MVP.
-
JG: don’t love it
-
CW: would be asynchronous w.r.t. the CPU, but since there’s only one
Queue there’s no asynchronous work.
-
JG: could create command buffers in parallel but not submit them in
parallel.
-
CW: yes.
-
JG: don’t love it, it’s an important part of the final design.
-
BC: it’s one of the reasons people move to these low-level APIs.
Understand JavaScript makes concurrency hard. There were situations where
we moved to D3D12 specifically to have multiple queues. Think
these are the
problems people will want to solve. This is one of the designs that will
completely change everything. Understand this makes things difficult
because it reopens the barrier discussion. But think it needs to
be in the
MVP.
-
CW: agree with BC’s statement. If we want to keep this as an open
question then we should talk about it as a second phase in the memory
barrier discussion. Synchronization between queues is a variant of memory
barriers.
-
MM: agree it’s an open question.
-
CW: Queues need to be created at device creation time.
-
JG: Vulkan requires this. Or need to create things in the background.
-
CW: agreed.
-
CW / JG: Render passes.
-
Metal’s render encoders and Vulkan’s renderpasses will be encoded by
the same concept – “RenderPass”.
-
JG: seems correct. Similarity between Metal’s render encoders and
Vulkan’s subpasses.
-
DM: Let’s not associate subpasses with Metal encoders. May encode
multiple subpasses into a single encoder.
-
CW: esp. if we use framebuffer fetch for optimization.
-
MM: Reason why they were marked as the same object, is that this is
where you attach textures for rendering.
-
JG: they’re not exactly the same but are similar.
-
CW: not really consensus but more shared knowledge that these API
objects are the same.
-
CW: work should be done in a render pass between Begin/End, can not do
compute at the same time. Q: should we say that you can’t do compute at the
same time?
-
Because of Metal and Vulkan, can not mix graphics and compute work.
-
Technically, Metal 2 on iPhone X can do it.
-
MM: we want to target more than a single phone.
-
CW: so should we explicitly Begin/End compute passes? Believe we
should. Shows the developer that things are separate.
-
JG: no strong disagreement. Sort of makes sense but hesitate to
commit to it yet
-
DM: makes less sense than graphics passes because compute passes do
not share as much as subpasses in graphics. That’s why the case for
Begin/End on the compute encoder is weaker than that for the graphics
encoder. But, fine with Begin/End on compute.
-
CW: tentative consensus
-
CW: Open question: begin / end blit?
-
CW: consecutive render passes don’t inherit state. Same between render
and compute passes. Render buffer state between passes is separate.
-
Is there state inherited between subpasses?
-
DM: sure. You can bind descriptor sets between subpasses in Vulkan.
-
CW: in Metal you can not inherit state between render encoders.
-
DM: but Metal’s the only one which doesn’t inherit state. And Metal’s
the only one that doesn’t have subpasses.
-
MM: wish people would go back and read the notes, because it’s clear
we had consensus on this issue.
-
CW: AI: add a link to the notes and talk about this more.
-
CW: you can only change attachments at a renderpass boundary. Think this
is obvious because of Vulkan’s and Metal’s constraints.
-
DM: you mean binding a framebuffer. Yes.
-
CW: open question about whether renderpasses have synchronization.
Should wait until the memory barrier discussion.
-
CW: consensus about using Vulkan’s 3-layered hierarchy of sets of
descriptors, and a small number of descriptor sets to bind things. So you
bind a descriptor to a descriptor set, and support lots of descriptors, but
only small number of descriptor sets.
-
Can keep open questions about D3D12’s descriptor heaps, more q’d
about Vulkan’s descriptors.
-
RC: so do we have consensus?
-
CW: we don’t have consensus on descriptor set allocation optimizations
-
RC: do we have consensus that there’s a straightforward and
performant mapping?
-
CW: will be performant and easy-to-map if it looks like Vulkan. The
only questions are around allocation of descriptor sets and pooling of
descriptors.
-
CW: lots of consensus about pipeline states. Document in the repo covers
this.
-
DM: renderpass information is not included as consensus.
-
MM: will update the document. It was old.
-
CW: Got through most of it. Can leave the rest for Chicago.
Agenda for next meeting
-
Chicago F2F Agenda
-
Discuss shading languages afternoon Friday the 22.
-
But following meeting will also be about shading languages.
-
CW: for Chicago: please add topics to the agenda document.
Received on Friday, 15 September 2017 19:35:39 UTC