Minutes from the 2017-09-22 meeting from Corentin Wallez on 2017-09-26 (public-gpu@w3.org from September 2017)

From: Corentin Wallez <cwallez@google.com>
Date: Tue, 26 Sep 2017 09:54:30 -0400
To: public-gpu <public-gpu@w3.org>
Message-ID: <CAGdfWNOxJ6NEg5Amc1GFwjTwr_zDX6bR-fy-LQPbHXrNX=4=-g@mail.gmail.com>

GPU Web 2017-09-22 Chicago F2F

Minutes from last meeting
<https://docs.google.com/document/d/1seCUVBkzkRPEj0sfcDBymGjwSPndTGPhsQJPkUQscNY>
TL;DR of the TL;DR

Better fleshed out target for the MVP, some decisions are still pending
investigation
-

More discussions of implicit vs. simplified vs. explicit memory
barriers. Action items to make investigations on example use-cases.
-

Metal is able to automatically run some encoders asynchronously.
Discussion on doing this vs. having application explicitly handle queues.
-

Fil + Myles showed their prototype language WSL that encodes the
constraints of SPIRV logical addressing mode.
-

Widespread agreement that it is an improvement over current languages.
-

Discussion on which language the API consumes (SPIRV vs. WSL)
-

Agreement that we should have a blessed language, but which one?
-

WebGPU devices will be created from thin air and there is something like
“WebGPUSwapchainCanvasRenderingContext”. Also how WebVR would work inside
the frame callback.
-

DOM elements uploads are from image bitmaps or video source, that’s it.

TL;DR

Mozilla has a WebGPU prototype <https://github.com/kvark/webgpu-servo>
running on D3D12 and Vulkan
-

MVP Features discussion
-

MVP should contain structural elements of the API and prototype
ergonomics.
-

MVP doesn’t have to run on all target hardware of v1
-

Open question: shading language for the MVP? WASM + JS API or just JS?
-

Out: tessellation, transform feedback, predicated rendering, sparse
resources, pipeline caching, aliasing
-

Tentatively out: DrawIndirect / DispatchIndirect (need
investigation), “glGenerateMipmaps”, queries
-

Tentatively in: multisampling
-

In: compute, fragment, vertex, render passes, MRT, upload download,
copies, instancing, binding model, command buffers, pipelines, a dummy
extension
-

If we decide on multiple queues: async compute, queue
synchronization
-

If we decide on explicit memory barriers, they are in
-

Memory barriers
-

Initial opinions
-

Apple: simplified API with implementation doing more work. Vulkan
version would do more work but a design can make speed good enough
-

Google: declarative and failsafe API with explicit transitions,
should map nicely to all APIs, ensure apps don’t fail to
render on Android
-

Microsoft: D3D team’s experience is that barriers are required to
make bindless work because driver doesn’t know which resources will be
accessed.
-

Discussion that Vulkan and Metal aren’t really bindless.
-

Mozilla: would like to give developer full control and power of
underlying APIs
-

Metal has “explicit barriers” at encoder boundary. Can run some
encoders async.
-

Discussion on whether to allow for barriers inside a subpass and what
the usecase for it is in Vulkan.
-

Discussions about Vulkan render passes, how it requires memory
dependencies between subpasses to be specified or it would cause pipeline
recompiles.
-

Agreement that we can’t validate shaders with data races (via UAVs).
-

Industry building task graph abstractions. Metal provides it as a
linear sequence optimized by the driver. Vulkan provides subgraphs at a
time with render passes.
-

Frostbite’s FrameGraph:
https://www.ea.com/frostbite/news/framegraph-extensible-rendering-architecture-in-frostbite
-

Need to gather use cases requiring memory barriers and see how they’d
be implemented in each API.
-

Multiple queues
-

Metal can push different encoders to different hardware queues
automatically. That the app can create multiple MTLQueues is just because
it didn’t make sense in ObjC to limit creation of only one MTLQueue.
-

Discussion about automatic async compute on D3D12 / Vulkan.
-

D3D exposed multiple queues to stop analyzing the command stream
in the driver. Maybe not needed for WebGPU.
-

Explicit queues make app point out parallelizable commands.
-

Order of submits in a Vulkan app is a good order for encoders in
Metal. Will need validation of the correctness of the order
in all cases.
-

Shading languages
-

Fil presented his and Myles’s work on making a language with
constraints of SPIRV logical addressing mode built in. Called WSL here.
-

Familiar C syntax, generics, operator overloading used to
implement vector types for example.
-

Shader terminates early in case of error.
-

Special pointer types encoding SPIRV constraints
-

T^ cannot be cast, cannot be assigned after declaration
(content can though), and function returning them can have
only one return.
-

Some slice types for arrays, went into less details for them
than for T^
-

Goal is to have bisimulation with SPIRV to show they are
equivalent.
-

Follow-up discussion:
-

Agreement that language improves greatly on GLSL / HLSL
-

Concern about creating a new language, and asking people to move
over
-

Concern about generic instantiation bloat when people want to keep
one code path.
-

Apple suggest API consumes WSL directly.
-

Concern WSL doesn’t reduce the number of checks or speed of
validation.
-

Discussion of advantages of WSL over SPIRV and rebuttals:
-

View source vs. shipping shaderc WASMed only where needed
-

Security built in vs. logical addressing mode is just that
-

Flexibility for WebGPU vs. SPIRV execution environment
-

All APIs require same amount of translation vs. ???
-

Request for name change: WSL is Windows Subsystem for Linux
-

Myles did a demo of WSL
<https://cdn.rawgit.com/webkit/webkit/master/Tools/WebGPUShadingLanguageRI/index.html>
-

AIs on showing equivalence of SPIRV and WSL, and showing
validation of SPIRV logical and buffer accesses instrumentation.
-

Need to talk to Khronos to see what implications of using SPIRV
are.
-

Concern that WSL was quick to dismiss prior art and battle tested
toolchains.
-

Roundtable
-

Apple: Think WSL meets requirements from the group. Think SPIRV
could be ok provided there are more investigations.
-

Google: WSL is a great investigation, need better defined
requirements for shading languages, SPIRV gets us far
quickly and our
intuition is that it is the right choice. WebGL experience
is that native
parsers are huge source of bugs.
-

Microsoft: View source is important, suggest HLSL is the
language of choice as there has been a focus on
standardizing it. It has
the largest amount of content. Think at a low-level it
would be better to
accept SPIRV than DXIL.
-

Mozilla: We should push the platform, making a new language
would slow us down.
-

Developer PoV: a textual representation is important for
education etc. so blessed high level language is important.
-

Suggestion to use HLSL as de-facto high-level language and SPIRV
as intermediate level. People would want a better spec for HLSL though.
-

DOM interactions
-

Agreement that a WebGPU device (root object) is created from outside
of a canvas.
-

Consensus there is a WebGPU device constructor with no arguments
-

Agreement that there is a canvas rendering context that gives you a
“WebGPU swapchain” that hands out texture for rendering.
-

Consensus that WebVR is a supported use case. Will need a way to
update buffers synchronously without blocking inside the WebVR frame
callback.
-

In the WebVR frame callback the application will ask the WebVR
swapchain for the next textures.
-

WebGPU might require rendering in a texture array and not
side-by-side like is currently allowed in WebGL.
-

Only one entry-point to upload a 2D DOM element; it takes an
ImageBitmap.
-

Another entry-point to create a texture “video source” from a video
element.

Tentative Agenda

Morning (9AM - 1PM):
-

Status updates
-

MVP features
-

Memory barriers
-

Multiple queues
-

1PM - 2PM: Lunch
-

Afternoon (2PM - 6:30PM):
-

Shading languages
-

DOM interactions
-

Others (for extra time)?
-

Swapchain/presentation
-

Re: descriptor heaps
-

Re: index format in pipeline state?

Attendance

Apple
-

Dean Jackson
-

Filip Pizlo (by phone for shading language)
-

Myles Maxfield
-

Google
-

Brandon Jones
-

Corentin Wallez
-

Kai Ninomiya
-

Ken Russell
-

Shannon Woods
-

Zhenyao Mo
-

Intel
-

Bryan Bernhart
-

Yunchao He
-

Microsoft
-

Chas Boyd (by phone)
-

Rafael Cintron
-

Mozilla
-

Dzmitry Malyshau
-

Jeff Gilbert

Administrative stuff

DJ: Lawyers for Apple / Google / Microsoft trying to figure out a
software license to contribute code to the group. Kind of a new thing for
W3C. Mostly on agreement and working on final wording. Will likely look
like Apache license.
-

JG: Mozilla would like a copy of the license.
-

DJ: the companies want to use this for other projects as well, like LLVM
(and maybe ANGLE).
-

DJ: W3C has an all-groups meeting called TPAC. We don’t have a slot to
meet there (Bay Area this time), but could get a couple of hours to meet if
we like. Don’t think it’s worth it. Could have a session inside the
WebAssembly group where we talk about what we want for an API to call from
WebAssembly. (Currently WASM can only call JavaScript.) Graphics will
probably be the first external thing called from WebAssembly.
-

CW: depending on DOM interactions discussed this afternoon there might
be more stuff too, like mapping buffers inside the WASM memory space.
-

DJ: will coordinate with chairs.
-

CW: also signed up for making a demo for W3C attendees. Will re-show
demo from Vancouver F2F showing compute + graphics together. Attendees will
be folks who are not GPU experts.

Status updates

Apple:
-

Haven’t done anything recently to existing impl in WebKit
-

Would like to move it closer to what we’ve already decided in the
group, and make it clear that it being “WebGPU” isn’t the
decision by this
WG
-

Myles and Filip have been doing an experiment to design a secure
shading language. Partial implementation in JavaScript done.
-

Google
-

Implementing index format in pipeline state that we talked about last
meeting
-

Works
-

Writing tests, verifying primitive restart on all backends
-

Intel
-

No update
-

Microsoft
-

Haven’t been writing any code
-

Have been talking about shading languages and memory barriers
-

Mozilla
-

Made big progress on D3D12 backend
-

Working on GL backend
-

Figuring out rough spots of descriptor heaps, resource heaps and
pipeline barriers
-

Think desc + resource heaps can look a lot like Vulkan and be
efficient on D3D12 and Metal
-

WebGPU prototype <https://github.com/kvark/webgpu-servo> running on
D3D12 and Vulkan!

MVP Features

CW: There are things on the mailing list which we’ve ruled out of the MVP
-

Want it to be enticing, but also not hard to get right
-

DJ: to be clear, this wouldn’t be version 1.0, and could still make
breaking changes (hesitantly)
-

Enough to convince ourselves and the community that it’s the right
direction
-

CW: also get ideas in concrete form to see what works/doesn’t
-

Want most things in there that will cause structural issues / changes
-

DJ: it’s important because we don’t have many facts or much experience
writing code or content with the ideas we’ve come up with
-

Metal 1.0 : the way Apple went about it was to start with a small
feature set and add it over time, as we got feedback from developers and
hardware changed
-

DM: less interested in getting developers interested, but rather focus
on things that will affect the architecture
-

DJ: more interested in ergonomics and development. If we’re writing
content for WebGPU, if we think it’s too difficult / easy, we can adjust
-

CB: how do we define the feature set? List of things that have to be
operational for someone to be interested? What are the features of the
common subset of the APIs we’re targeting for initial version?
-

Expected HW configs
-

Whether we’re trying to support “Big Compute”
-

DJ: Ben Constable suggested trying to get to the point where we can draw
a triangle on the screen
-

CW: we do have different prototypes which do this. But as a group we
don’t have enough consensus to develop an API which can draw a triangle.
-

If you just want to render a textured triangle there’s a lot of
structural stuff you can ignore. Like how you get back data from the GPU.
But this can affect a bunch of parts of the API.
-

So let’s focus on structural stuff as well as stuff that’s cool for
demos.
-

Don’t need all the pipeline state like all the blend functions.
-

But if you don’t specify how to give textures to the shader you have
a problem.
-

MM: makes a lot of sense. Rather than working toward one program, let’s
work toward a set of programs.
-

CW: maybe let’s decide what’s not going to be there? Small list in the
email to GPUWeb:
-

Sparse resources
-

Transform feedback
-

Tessellation
-

What’s left: compute and graphics workloads
-

MM: do we need a blit encoder at the beginning?
-

CB: copy engine
-

CW: probably need that, if only from upload buffers to textures
-

DJ: don’t include:
-

bundles / secondary command buffers
-

stream-out / transform feedback
-

predicated rendering
-

tessellation
-

sparse resources
-

Roadmap is at https://github.com/gpuweb/gpuweb/wiki/Roadmap
-

CW: workloads:
-

Rendering
-

Vertex/Fragment shaders
-

Multiple attachments (multiple render targets) (?)
-

Render passes
-

CB: definition of MVP is that it’s viable in the marketplace
-

CB: think people will want G-Buffers for deferred shading
-

CB: around compute: do we have to support asynchronous tasks?
-

CW: depending on result of today’s later conversation, may need concepts
of memory barriers and multiple queues in the MVP. Whatever the decision is
(include or don’t), those will or will not be in the MVP
-

CB: think we should have async compute. Barriers are a separate process
we can determine later.
-

CW: they’re tied into memory synchronization, which ties into queue sync.
-

JG: the idea of fences is different than memory barriers
-

CW: the idea of both, and whether they’re implicit/explicit, is going to
be predicated on the result of this discussion
-

JG: grouping these disparate topics into “synchronization” is too big a
chunk
-

CW: upload and download
-

JG: memory model
-

RC: instancing (group: yes especially since it should be easy)
-

KR: we don’t have to have *everything* from WebGL 2.0. Even with
instancing, there are a bunch of variants (base vertex, etc.)
-

MM: what about GPU-driven rendering (DrawIndirect?)
-

The three APIs handle this slightly differently
-

Could be hard / change things structural
-

CW / JG: should investigate and see how hard it is
-

Probably don’t need it for the MVP, but if it might affect the
overall API structure, should consider including
-

MM: have investigated it a bit but not enough to talk about it
-

CW: binding model, pipelines, command buffers (goes without saying)
-

DM: resource heaps and how they work
-

CW: we had an NXT roadmap where we went through all of these items
-

CW: copies / blits
-

DJ: mipmapping?
-

Unclear; Vulkan doesn’t have it. Do it yourself
-

DJ: Metal does have this in the copy encoder
-

CW: pipeline buffer update?
-

In the command buffer, say “update buffer with this data”. Inline
buffer updates where the data is an immediate in the command buffer
-

DM: it is convenient and there’s a way to do it in all the APIs
-

MM: is this for performance?
-

CW: nothing you can’t do with a staging buffer, and Metal doesn’t
have it
-

MM: let’s leave it out then
-

MM: don’t need two ways to do this. Can add it later
-

CW: multisampling?
-

JG: yes. We have to handle resolve properly. Don’t trust absence of it
-

KR: could be a can of worms. We just gave developers multisampled
renderbuffers and now they want EXT_multisampled_render_to_texture.
-

JG: isn’t this transitions too?
-

CB: could we keep the resolve an opaque operation at this level of
the API? A high-level call on the resource?
-

MM: another way to do it would be to attach another texture to your
framebuffer and have it auto-resolve
-

JG: it’s presented more flexibly in Vulkan at least
-

CB: and in D3D too
-

MM: probably need at least facilities for it
-

JG: there are two levels.
-

MM: should figure out which level for the MVP
-

CW: think it should be part of the MVP; we need to figure out this
story.
-

MM: who doesn’t want this to be in the MVP?
-

KR: could be complicated
-

CB: once we define resources and copying, it’ll be easier to
understand how it works
-

DM: think that adding multisamples should be easy
-

CW: it’s different in Vulkan and is done on renderpasses if you want
to be friendly to tilers.
-

JG: don’t want to do it magically
-

RC: is auto-resolve magical?
-

JG: yes
-

JG: we should talk about it for the MVP. If implementing is onerous
we can re-discuss it
-

CW: let’s say it’s tentatively in the MVP, pending analysis
-

CW: memory barriers?
-

JG: figuring it out should be in the MVP
-

CW: queries?
-

timestamp, occlusion
-

DM: do we have an investigation of them?
-

CW: not yet. Metal has very few types of queries. Have occlusion
queries, but are a totally different concept.
-

CW: should investigate and
-

KR: Can we just say they aren’t in the MVP? In WebGL queries are 1
frame behind and people don’t like them and don’t use them.
-

DM: Can emulate them with pixel shaders and UAVs (and readback)
-

KN: they’re a little weird in Metal but shouldn’t be a structural
change. Should be a separate part of the API.
-

CW: tentatively out.
-

JG: shading languages and how you feed them (e.g. vertex attribute
marshaling)
-

CW: pipeline state etc. we all agree should be in the MVP
-

JG: we’ve already punted on pipeline caching
-

JG: resource aliasing?
-

MM: what was the result on heaps?
-

CW: pending investigation
-

RC: would say no on aliasing for MVP
-

CW: that’s my gut feeling too
-

MM: two ways to use this word. One buffer -> two points in the
shader. Or, a texture and buffer pointing at same memory.
-

CW / JG: we’re talking about the second one.

Meta stuff about the MVP?
-

CW: Should promise to break it and not enable it by default.
-

DJ: Helps with security. In Safari TP you can enable / disable
features at runtime.
-

DJ: Hardware we are targeting is essentially anything which runs Metal
-

For Google most of Android devices which ship Vulkan
-

For Apple, any Metal 1.0 device (nearly all iPhones ATM)
-

Some smaller subset of Mac hardware excluded that doesn’t have
Metal.
-

https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf
-

For Microsoft: all D3D12 devices
-

If you have the 12 API, it can be used, so we should target it
-

CB: lowest-end DX12 capable system will be as tough to target
as the lowest-end Android phone. So not clear that D3D12
will be defining
the floor

CW: suggest that for MVP, not guaranteeing that it’ll run on all of
these systems
-

DJ: on Android Vulkan’s been supported for two releases, but first
release was a bit shaky.
-

CW: Android still only requires GLES 2.0. However, of the Android
devices which ship Vulkan, we’d like to support most of them.
-

MM: does every device which has Vulkan also have to pass the Vulkan
conformance tests?
-

CW: have to pass the CTS except for some disabled tests. But some of
these devices shipped on CTS versions that were incomplete, so there are
bugs. Some have iffy Vulkan devices. We’d have to do workarounds.
-

JG: we’ll try to run on Vulkan machines, but we won’t hold back MVP
(or change the API) for broken devices.
-

CW: Vulkan does have a lot of optional features, including logic ops
(!). Might need to ask to remove certain features from the MVP and recast
as extensions. But extension story should be a post-MVP thing.
-

KR: point out that Vulkan’s extension support was structural -- the
optional void* extension pointer at the end of each struct
-

JG: agree, should have one no-op extension to understand how they’ll
work
-

CW: MVP’s content is important. Do we care about the API shape?
-

JG: yes
-

CW: ok, then a dummy extension should be there
-

KN: not clear how easy it will be to agree on an API shape
-

MM: shading language for MVP should probably be shading language we
progress forward with
-

Should also decide whether we’re going to make a JavaScript API,
WebAssembly API, both, or neither. :D
-

DJ: definitely have to have a JavaScript API. Question is whether we
have a C one or not. No way to call C from WebAssembly.
-

MM: we should have a discussion and there’s one right answer.
-

MM: have been keeping the roadmap document up-to-date to the best of my
ability

Memory barriers

CW: How do we get to a resolution
-

BJ: Trial by combat
-

DJ: Is there any way we can split it up into a smaller discussion?
-

We’ve already discussed the philosophy previously
-

Apple’s perspective: we think a much simplified API with the
implementation doing more work is the better solution
-

The other solution says that it’s good to give the developer all this
control
-

What this would mean to a Metal implementation: a bunch of stuff
would be no-ops because it’s handled by the implementation. And
the no-ops
wouldn’t slow things down.
-

Thesis for a Vulkan implementation: a Vulkan implementation would
have to do more work, because the Metal driver’s doing it; but think that
we can agree upon an optimal design that will give “good enough”
performance.
-

DM: clarification: we want both source and destination of transitions to
be specified. That’s the way Vulkan does it.
-

CW: we think memory barriers need to be explicit for many reasons, so we
should expose them to the app developer. But they should be declarative and
failsafe.
-

I have a resource, treat it as an assembled image, or as a vertex
buffer. (This allows barriers to be grouped.)
-

In other words, specify the destination state.
-

It’s a D3D12 transition barrier with only the destination stated.
-

DM: ah, so it’s a bit higher level.
-

Impl should do whatever is needed to make that happen.
-

Avoids developer needing to understand what the memory model means.
-

If the developer does it wrong, they’ll get a validation error.
-

It’s a simplified model, and will map well to all backends. Easier to
validate, easier to learn.
-

If we validate strongly memory barriers, then WebGPU will work
seamlessly across desktop and mobile.
-

If we *don’t* do this, developers will make things that work on
desktop but *not* on mobile.
-

RC: spoke with members of the D3D team.
-

D3D11: very explicit. Bind everything ahead of time.
-

API had all the information it needed to do the barriers for you.
-

Shaders: indices into arrays had to be constant.
-

In the new bindless world, you can’t know that. In the shader, your
array indices are not constant. You can calculate the index,
index into the
table, read from here, write there. No way with that model for
the runtime
to figure out what you’re reading from and writing to.
-

Had to add memory barriers.
-

For this reason, we think that version 1 of the API should have
memory barriers, to set ourselves up for the new bindless future.
-

If we have dynamic indices in the shader, don’t think we can figure
out what’s read and what’s write.
-

CW: what does this mean in practice?
-

RC: this means we need to make barriers explicit.
-

CW: are the barriers validated for correctness?
-

RC: think it will be very difficult to do. D3D lead said, if you can
figure out a way to validate them, then the API should auto-do it for you.
-

JG: there are different degrees of validation. Can ensure something’s
safe without ensuring it’ll be completely portable.
-

RC: if they’re auto added for you, is it twice the cost? 3x?
-

CB: barrier model we have in DX12 is slightly higher level than the
one that’s in Vulkan. But lower level than what’s in Metal. Trying to
understand how this ties into goals of resource binding model.
-

DX11: max 128 textures bound to single pipeline, statically
indexed at compile time.
-

New APIs: can compute that index inside the shader, and it can go
up to ~1 million textures.
-

CW: don’t think that’s the case in Vulkan. Vulkan still caters to
fixed-function hardware. Bindless isn’t mandatory. Easy to
change part of
the bindings. Don’t think you can access millions of
descriptors like D3D.
-

CW: Metal is D3D11 style. Has bindings, not bindless. Has dynamic
indexing, but it’s a texture table.
-

RC: the D3D team’s conclusion was based on having bindless. If we
don’t have this then maybe it is possible to auto-insert barriers.
-

With advent of multiple queues the runtime can’t insert them for
you.
-

CB: example: CopyEncoder, then texturing from it. Have to signal that
the copy is done.
-

DJ: Do you mean only for async copy / compute case?
-

CB: yes, if it’s in a separate queue then it’s asynchronous.
-

DJ: thought we’d agreed there’s only one queue? (CW: no)
-

DJ: can be potentially asynchronous in Metal as well. Metal has
explicit barriers. It’s about the encoder, and inside compute.
-

CW: probably inside blit as well. Just a bit simpler.
-

MM: not sure about that.
-

RC: so for compute, do you believe in explicit barriers?
-

DJ: yes, we believe in explicit barriers for some cases.
-

CB: within a particular encoding stream.
-

MM: render a triangle, then a second triangle, in the same encoder.
The second triangle has to appear on top of the first.
-

JG: other commands and dependencies can happen at different times. (?)
-

In Vulkan you can have things run in parallel
-

MM: write to texture in fragment shader, then read from that texture,
it’s not defined. Would need to end the encoder.
-

CW: in Metal barriers are inserted between the encoders. A Vulkan
subpass corresponds to one Metal RenderEncoder.
-

DM: it’s not 1:1
-

CW: if you need to do a barrier between Vulkan subpasses, then you’d
split the Metal encoder. You don’t have barriers inside subpasses.
-

DJ: that’s what I meant about explicit barriers in Metal.
-

MM: what we’re really talking about isn’t whether there should be
barriers, but what should the programmer describe when they need
synchronization.
-

CW: think we can agree that we don’t want any form of barrier inside
subpasses, because that’s impossible to implement.
-

JG: subpass self-dependency
-

CW: limited what you can do in there. Vertex UAV writes -> Fragment
UAV reads.
-

JG: thought this is what could help the tiler
-

CW: in Vulkan, can have dependencies between subpasses.
-

CW: in Vulkan, can only push data from vertex to fragment inside a
subpass
-

Don’t know any use case for it. Would be ready to not have that.
-

JG: Vulkan spec section 6.6.1 about dependencies
-

CW: think this too hard and niche, and we shouldn’t put it in. (On a
tiler GPU, putting barriers between vertex and fragment
processing without
flushing the tile caches)
-

RC: so you need to close the pass and open a new one?
-

CW: yes. Because Metal and Vulkan are catering to tiled GPUs, have to
be explicit about when rendering to a certain attachment set is
started and
ended. If you want to read from the attachment, it’s required that you
can’t do so from the same subpass.
-

CW: it’s more like UAVs where you write to it from the vertex shader
and read from it in the fragment shader. Can’t think of a use
case for this.
-

CB: don’t think this works anywhere.
-

CW: might work on tilers?
-

CB: read-after-write hazard. Plenty of things people do after a
render and they have to switch layouts. Very common hazard.
-

RC: so, it’s a hazard to write to a UAV in a draw call and read it in
a different one.
-

CW: Vulkan subpass self-dependency.
-

MM: one thing we can all agree on: shouldn’t be able to write to a
UAV from a vertex shader and read from it in the fragment shader, in the
same draw call.
-

CW: to close this topic: we don’t put memory barriers inside
subpasses. If you need this for your use case you do a different subpass.
-

RC: we do agree on some kinds of barriers!
-

JG: I am curious why this wound up in the Vulkan spec…
-

MM: until that question’s answered, we should proceed
-

RC: rendering to a texture and reading from it requires a new subpass?
-

CW: yes
-

CW: this would be implemented as OMSetRenderTarget in D3D12
-

MM: next topic: during the boundary between one renderpass and the next,
what should the programmer say?
-

CW: in Vulkan, when you have renderpasses with multiple subpasses,
for each attachment, you have to say how it is used when.
-

CW: redundantly-ish, have to say what are the memory barriers between
subpasses.
-

The membars between subpasses can be a superset of transitions
-

Can say: want buffer writes done in that subpass to be visible in
this subpass
-

In addition to transitions of textures
-

CW: if we support renderpasses with multiple subpasses – which I
think we want because it’s very handy in both Metal and Vulkan
on tilers –
then when we create the renderpass describing the rendering algorithm, we
need to say “I want the shader writes done here to be visible over here”.
-

CW: otherwise on Vulkan we have to guess, and take the worst-case
guess, leading to a pipeline recompile. So we really need them described.
-

CW: between subpasses, need to encode which memory barriers it might
require.
-

MM: and if the app gets it wrong?
-

CW: we should find a way to validate that.
-

MM: the validation compares “expected” vs. “real”?
-

CW: in renderpass: app says, at this point, i want to be able to have
resources that go from “shader writes” to “being sampled”. Each resource,
it says it does that.
-

MM: if you have information about what the app’s doing: then you can
retroactively insert barriers immediately?
-

CW: no. Renderpasses encode memory barrier information. Pipelines are
compiled against renderpasses, and can only be used in compatible
renderpasses.
-

DJ: can’t change the renderpass after it’s been “closed” because that
would cause recompilation of the pipeline?
-

CW: yes.
-

MM: in order to make that work you’d need to wait until the end of
the pass
-

DJ: if you were going to do it automatically: you’d need to record
the commands, submit everything, and then …
-

CW: wait until everything’s done. Build renderpasses. Then recompile
pipeline. Then encode command buffer.
-

JG: why do we have to recompile pipeline?
-

RC: is the recompile needed on D3D12?
-

CW: no. would not need to recompile pipelines. It’s not as bad as on
Vulkan.
-

CW: memory barriers between renderpasses change. Renderpasses are
compatible if they are the same in everything but the initial layout of
resources (framebuffer swizzling, etc.), and load/store operations for
different things (in Metal)
-

CW: if you’re on a tiler and you have to flush the tiler, you want to
take advantage of that for register allocation on your tiler.
-

JG: how would you have different initial and final image layouts?
-

CW: high-level point: if we have multiple subpass renderpasses,
there’s some implicit memory barrier that’ll have to be inserted.
-

DM: don’t understand how this can be done automatically on Vulkan.
user can not communicate to driver what to do.
-

JG: would have to infer dependency graph from what was submitted
-

DM: several different ways to do this. Not clear.
-

JG: in metal you encode things in an order. Things happen in that
order. Things that aren’t dependent can happen in arbitrary order.
-

MM: in Vulkan things aren’t ordered?
-

KR: it’s a render graph. Some parts can run in parallel.
-

KN: you can insert them in whatever order
-

CW: it’s like you provide your rendering graph to the driver, and the
driver optimizes / schedules it.
-

KR: engines are representing their frames as graphs internally
already. Frostbite:
https://www.ea.com/frostbite/news/framegraph-extensible-rendering-architecture-in-frostbite
-

CW: metal only provides one attachment at a time.
-

KN: in Metal you have to give things in the right order. In Vulkan
you can submit in any order but have to provide the dependencies.
-

CW: Vulkan’s way of it lets you do register allocation of the tile
cache.
-

JG: don’t see the distinction. All the APIs have dependency graphs.
-

CW: you want to understand exactly where your data ends up in the
tile cache. Metal doesn’t have information about the pipeline when
submitting.
-

CB: suggestion:
-

given that there’s some diversity of the use of the term “barriers”,
might be interesting to look at the top 3 or 4 use cases, see
how they’d be
implemented in each API, and see what abstractions would work
-

look at things like RAW, WAW hazards
-

Merging the APIs without that use case context will be a long tail
operation
-

JG: concerned we might miss use cases
-

MM: related question: have decided there’s at least one case where the
app is wrong. Where if you write to a UAV in vert shader and read from it
in fragment shader, that’s undefined. What happens then? How does the
browser know that this scenario occurred?
-

CW: don’t think we can validate that
-

MM: then we have unportable apps
-

CW: don’t think we can shield against concurrency bugs when we have
read-write buffers
-

MM: would annotate every buffer. all buffers attached to vert / frag
shader.
-

CB: in DX11, we can validate this, fail and unbind the previous bind
to the pipeline
-

B/C we have indexing in the pixel shader in D3D12, can’t validate.
Have a debug layer. Instruments the shader at runtime. Warns
the user that
that’s an illegal operation.
-

MM: think this sort of analysis needs to be done for every draw
call by the browser
-

CB: what we’re looking at is a model where we don’t support arbitrary
indexing. So we can do the D3D11 validation model.
-

CW: app allocates a big buffer. Read-write “stuff” in vert shader in
one part. Read-write “stuff” using frag shader in another part.
-

CB: APIs don’t allow this today. Problem you run into is that
segments of that buffer have been cached with different granularities in
different ways.
-

JG: swizzling patterns for tile subsections
-

CB: these are properties of the resource description
-

KR: that big a limitation to say you need two different buffers for
this?
-

JG: would be different from Vulkan
-

CW: would be fine from our point of view; slightly limiting
-

MM: think we should eliminate undefined behavior
-

DM: you already have this with just a single UAV. fragment execution
is unordered.
-

MM: what if you have only a single thread?
-

CW: works then.
-

KR: or if you use atomic ops
-

CW: we simply can’t verify shaders with data races.
-

Discussion about serial submission vs. parallel submission
-

KR: is it the same as topological sort used by compilers to linearize
graphs?
-

DM: don’t think so. better to submit the graph. if we establish the
order, then we limit the amount of reordering and rescheduling the driver
can do.
-

JG: it’s sort of about identifying hazards.
-

CB: I’m a big fan of task graphs. Covers all 3 APIs. Devs used to
graphical abstraction can author this almost with a markup language.
-

CW: aren’t renderpasses that task graph?
-

CB: yes, kind of as a tree. Or sequence of sub-graphs.
-

JG: it’s an incomplete graph.
-

CB: a lot of engine companies are looking at a task graph model. The
top level of their engine is already a task graph model and
they’re looking
for a more direct mapping. So a task graph in the API would not preclude
using it in AAA content. Or Unity. (ooh, burn)
-

RC: ?
-

CB: so if we express things at a graph then we don’t need barriers
and we can use the graph to express dependencies
-

RC: so all the dynamic UAV stuff would have to be inserted into the
graph?
-

CB: not sure we can say that there wouldn’t need to be some kind of
“UAV barrier” inside the shader
-

CW: do we want to minimize undefined behavior?
-

JG: we have different concepts of that
-

MM: we as a group shouldn’t pursue eliminating undefined behavior as
the only goal of this group
-

But, it is valuable to limit undefined behavior
-

It’s not the only goal, or the most important goal.
-

CW: we should minimize undefined behavior at the API level, while
staying at our perf target of 80-90% of native.
-

KR: but not, say, a factor of two hit.
-

Discussion about Vulkan’s requirements that:
-

Renderpasses: get (attachment descriptors, subpass descriptors,
subpass dependencies)
-

Pipeline descriptors get Renderpasses and subpasses
-

Then BeginRenderPass gets the renderpass and textures
-

There’s needed compatibility between renderpasses and pipelines
-

How to make progress on these
-

Would like to get some use cases and understand how they’d be
implemented in Vulkan (and other APIs)
-

MM: use cases are good. They won’t be comprehensive.
-

JG: think we made a bunch of progress here

Multiple queues

CW: ties in to this topic and will be just as contentious
-

In the roadmap, we have consensus on queues such that:
-

There should be one queue type that can do everything on all APIs
-

Some implementations may support multiple queue types
-

It’s not clear whether we can have more than one queue per type
-

Not sure whether we should force all impls to have multiple queue
types
-

MM: Metal doesn’t have synchronization between multiple queues
-

We agree that we need synchronization between multiple queues
-

JG: in Metal you can get callbacks when queues are done
-

MM/all: but that’s round-tripping to the CPU
-

MM: regardless of once per frame or a few times per frame, you have to
round-trip
-

If you’re going to have multiple queues, you’ll probably require
synchronization without round-tripping
-

Metal doesn’t need this because they only have one queue
-

If the implicit dependency graph is that the things can run in
parallel, and the GPU has facilities, they can run in parallel
-

CW/JG: discussion about multiple queues and fences
-

CW: you’re not intended to use multiple queues in Metal, because the
synchronization is through the CPU. In Metal, if the driver discovers you
can take advantage of parallel hardware queues, it’ll parallelize it.
-

Async compute happens automatically-ish.
-

JG: understood.
-

MM: in metal there’s no reason to use multiple queues. The fact that you
can make multiple queues is just a natural thing. But they’re not designed
to be used.
-

CW: there’s device submit. Queue submit is queue.device.submit.
-

RC: how do you specify the dependency graph?
-

MM: it’s implicit. As described during the last meeting.
-

Ex: blur something just drawn. One RenderEncoder which draws the
thing. Second ComputeEncoder which lists that the texture you
drew into is
a readable input. Dependency graph is implicit.
-

JG: do you think we should have multiple queue instances in this API?
-

CB: basically asking whether the app should say what can run in
parallel, or the API should determine what can run in parallel via
specification of dependencies
-

CW: if explicit, then API has to include queue synchronization
facilities (on the GPU – no round-trip to the CPU).
-

DM: Metal backend could say that it only has one queue available so that
it doesn’t have to implement synchronization.
-

KR: can we support async compute in Vulkan without making everything
explicit? Like Metal?
-

CW: you have to declare which queue type things can run on
-

JG: two types of objects in Vulkan, shared and exclusive. Shared can be
used across multiple queues. Exclusive have to be transitions. Can
transition sub-parts of objects to run on different queues.
-

DM: “concurrent” and “exclusive”.
-

DJ: async in Vulkan: different queue type / instance.
-

CW: one instance “graphics/compute/blit/present”. another
“compute/blit”. Do main rendering on first one. Async compute goes on
second one.
-

CB: motivation in DX12 was: get the drivers out of the business out of
analyzing command streams and determining what was parallelizable. But at
this level of abstraction that doesn’t seem like that much of an issue.
-

JG: would be nice to retain the benefits.
-

CB: if we put this all in a single ref implementation we can all
optimize it ourselves. Can provide optimization of Metal behavior in a way
we’re all comfortable with.
-

JG: can a ref impl be good enough that we’re satisfied with doing it
automatically?
-

CB: not sure there’s much value to be added with letting the app do it.
It’s just that we’ve seen arbitrary cost in some drivers. But if it’s our
ref impl then we can do it.
-

MM: no one way to do it right?
-

CB: Metal team seems to have figured it out.
-

CW: intuition: if we make queues explicit, think apps are more likely to
take advantage of them.
-

RC: so in Metal you have to tell the encoders what your inputs are, so
it can figure out that the compute stuff can go in parallel?
-

CW: they’re provided when you say SetFragmentBuffer
-

MM: when you create the encoder you don’t say “I’m going to use these
resources”. But at the time you list them you’re using them for what you
want.
-

When you describe you’re going to use this texture for this purpose
it does 2 things. Attaches texture to shader. And says that
synchronization
is needed.
-

RC: and if you say i’m just going to run this, then can run in parallel?
-

MM: yes, if you have a compute thing with no buffers and textures
attached, then the compute thing could run entirely in parallel.
-

Rendering algorithm with two textures as input, both filled via
compute. Those compute things could run in parallel.
-

DJ: given you have to express the deps up front, why do you need a
separate queue?
-

JG: section 6.2, sync guarantees. Submission on a single queue is
implicit
-

MM: no. first thing finishes before the second thing *finishes*.
-

CW: also, can’t put compute in render passes in Vulkan.
-

MM: dean’s question is why this is required.
-

CW: dumping sub-parts of the graph which are graphics-only.
-

MM: that’s a bad design. why?
-

CW: tile cache might be using compute shared memory
-

JG: this might be a concession about using a single queue without
working about command buffer sync
-

CW: maybe a concession to console developers and they want full control
over the hardware. maybe they have a task graph but want explicit control.
-

KR: it might be worth trying to do this automatically
-

CW: sync between queues has a cost. If you have a tiny compute shader
used to generate a DrawIndirect, and it has sync and what not, it’s not
worth to put it async. We can’t know the cost upfront.
-

MM: there’s a cost to marking a compute shader ‘expensive’, and
submitting it to a separate queue
-

CW: seems easier to have the app tell you to run the thing in parallel.
Doesn’t necessarily mean that we expose the concept of queue, but the graph
needs to be specified up front.
-

MM: that seems easy to agree to. “This computation could possibly be
asynchronous”.
-

KN: not necessarily one compute op. Maybe multiple, and have to be
ordered w.r.t each other, but async w.r.t everything else.
-

MM: the app submits to different queues, and you have your async
compute. At end, want to join them and show frame. In Metal you can’t do
that.
-

CW: in Metal, you’d have compute and render happening in parallel.
RenderEncoder A, ComputeEncoder B. RenderEncoder C, renders to final render
target, and implicit dep on both. Submit both in any order, and Metal
figures out A and B can run in parallel, and have to join for C.
-

JG: if you have an active pipeline then you could make the pass-back to
the CPU to establish this
-

MM: if you have things that are sharing the same buffers, then in Vulkan
one goes to one queue and one to another. In Metal, could easily get into a
place where you deadlock because the ordering is wrong.
-

CW: yes, RenderEncoder A using a buffer, RenderEncoder B using the same,
and they won’t run in parallel because they might race.
-

MM: opposite. App puts one in one queue and one in another.
-

CW: app can’t do that without inserting transitions of resource from one
queue to another. Using resource for writing in two different places.
Invalid.
-

DM: exclusive ownership for resources?
-

CW: yes, that’s my view – just a proposal. A resource is either readable
or writable as one specific type of thing on one queue.
-

MM: a bit blocked. Higher level?
-

CW: at any single point in time, a resource is either readable by the
world, or writable by only one queue. (This is just a proposal for
eliminating undefined behavior) In the backend, would put in
synchronization (in vulkan – in metal would no-op)
-

MM: in the one Metal queue, you’d first have to submit the commands –
the command flow has to follow the resource.
-

CW: on the app side – resource is used first for render, then compute.
Submit render command bufs using resoure. Transition rsrc from queue render
to queue compute. Now can submit command buffers that use rsrc for compute.
In Vulkan, would use a fence. In Metal, each time you do submit, create the
encoder, so things are well ordered.
-

KR: is there something sub-optimal for Metal here, where we have to
defer things to queue submit time?
-

CW: when you encode a command buffer you’re putting it in the queue. You
have to encode things in order.
-

MM: no, you don’t have to encode things in order, but commit them in
order.
-

KN: thought you had to commit one encoder before you got the next one.
-

MM: drawing use of buffers on different “queues” and different
dependencies which would cause deadlock in Metal but not Vulkan. (B/A, A/B)
-

CB: deciding whether we need explicit parallelism.
-

MM: suggesting this is impossible.
-

CW: the Metal driver’s doing dependency analysis. That would be really
bad in the backend. The driver’s signed up to do that, but not the backend.
-

CW: in Vulkan, when you do queue submit, things have to be transitioned
into the right state.
-

MM: so both vulkan and metal will have to validate this scenario?
-

CW: that’s validated if you have explicit transitions.

Shading language

Fil’s presentation

Discussion about various topics
-

DN: there’s a lot of content for GLSL. Let’s say you added generics and
slicing. slicing looks like the killer app for this. If I were to explain
this to someone in the GL world, it’s a nicer GLSL with slices and
templating.
-

CB: in the HLSL we’re working on adding these to the cut-down version of
C++. One difference is we’ve had unions on our plate for a while. Know
we’re not going to be able to implement all this on the GPU.
-

DN: OpenCL C++ kernel language has templates etc. For C++ people but
removes much of the dynamic stuff. Still keeps the OpenCL C pointer
restrictions.
-

DJ: you wouldn’t need logical addressing mode restrictions for OpenCL.
-

DN: we’re all going after the same GPUs.
-

CB: we’re all trying to go after the C++ model, but going after the same
hardware, as the hardware evolves.
-

DN: want to separate programming model concerns with technical concerns.
-

DN: still not sure what the security model is (bounds checks, etc., at
what time do you detect that)
-

DN: you’re creating a new language and asking everyone to move over
-

CB: but they’re not breaking changes.
-

CW: WSL looks like a subset of HLSL
-

CB: not into the whole branding a language for the sake of it
-

FP: WSL is C++ without classes. We gave it a name just to have a name
and a directory to put it. It’s the kind of language you can tell someone
who knows C++ that “here are the rules”. You mentioned no clear story of
how you handle errors. MM and I came up with a thing that WSL will do:
program terminates early.
-

DN: we hadn’t agreed as a group what the criteria are.
-

DN: the generics you mentioned are template based, so you’d wind up with
e.g. 5 copies of the code if you had 5 different instantiations.
-

FP: Yes but because of inlining you would have 5 different
instantiations anyway.
-

DN: have talked with people who have significant codebases that say if
you have that genericity at compile time, you wind up with unacceptable
performance. They do dynamic polymorphism. When you access memory you
change how the load is done. Have heard this from multiple directions.
Might be the kind of thing you say, sorry, frontend has to handle this in
some way, even if it causes code bloat, etc. Maybe a concern, maybe not.
-

FP: the problem is inlining, not generics. If you allow a shader
language to have functions then you ultimately have to implement that in
the language by inlining.
-

DN, KN: that’s not true.
-

KN: has to be inline-able. But many platforms will not actually inline
it, because you’d end up with too many instructions.
-

CB: What we are looking at doing is maybe having a link step that does
dead code elimination.
-

DN: the problem is that dynamically, at runtime, you might have 1 of 100
different things
-

CB: it’s very dangerous to make 100 copies of cide
-

DN: High level point is: people who see “I need pointers because XXX”
don’t want instantation explosion but just one code path. The model
presented for WSL doesn’t help for that because it still has the code
explosion problem.
-

FP: understand what you’re saying. Valid concern. Data point: people are
using templates in Metal and they’re happy with them. The reason why it’s
kind of OK is: killer app for templates are killer numeric code. This is
what people use templates for. If you’re trying to write OO code using
templates, it’s hell.
-

DN: some customer of yours said “I’m using pointers because blah”, and
you create this solution, but you may take this back to that customer and
they’ll say “it didn’t solve my problem”. Maybe you’ll go back and do more
work in the compiler.
-

CW: question not related to language design: what’s the delivery
mechanism of the language to the API?
-

MM: it can be whatever we come up with.
-

CW: is it a goal of this language to be faster to type and safety check
than SPIR-V? or to be a High-level language to be accepted and lowered to
SPIR-V?
-

DJ: we think it’s not going to be significantly slower than type checking
-

MM: if your Q is “what language does our API accept” then the model is
that our API accepts WSL.
-

DN: so this is what’s being proposed to WebGPU.
-

KN: the only reason to add safety to WSL is because you can add security
checks more intelligently than SPIR-V. If our thing injected clspv and
opencl c with a restricted set of opencl c and we could type check it. Only
reason to add a new language is to more intelligently add safety checks.
-

DM: have we seen a case where WSL would be safer than SPIR-V?
-

FP: have the safety checks that are minimally needed to add memory
safety to SPIR-V been added so we can check them against WSL?
-

DN: haven’t spec’ed them fully.
-

CW: seems buffer checks mainly. There are also texture image fetches,
and you have the texture size available at the call sites. Feels like the
biggest safety feature is buffer checks.
-

DN: spir-v buffer fetches have been deployed to date on platforms where
robust buffer access is present.
-

MM: need to handle platforms that don’t have it.
-

CB: it gives you better access to safety
-

KR: point about needing run-time checks at all accesses of slices in WSL
-

KN: question about doing the checks up front
-

FP: if there were no rule about checking slices up front then in the
pointer case you’d be unsound. If I could create an array slice that’s
pointing out of bounds then a subsequent checked access might go out of
bounds. In logical mode I don’t think there’s a significant cost of bounds
checks here.
-

DN: you’re checking the object you’re referencing into. But you’ll
reference into the slice with a run-time determined value so you will have
to check it anyway. Effectively you have a fat pointer that you’re passing
around and you have to check the index.
-

FP: are you talking about logical mode or not?
-

DN: yes
-

FP: the reason why slice creation has a bounds check at that point is
that if you have totally unconstrained pointers, it’s like i’m giving you
an inductive hypothesis that that slice is valid. But need to check that
the slice is valid up front too.
-

DN: so guaranteeing that base object is valid. Now have an arg index
which is some number. Need a bounds check. It’s the same bounds check that
you’d need with SPIR-V logical mode.
-

CW: what’s the value of the API ingesting this language, vs. ingesting
SPIR-V which can be a compilation target of it?
-

FP: 1. textual format and not a binary format. we think based on
feedback from webassembly that future programming formats for the web were
textual so view source works.
-

FP: 2. this language already has specified type rules for areas that
have security implications. it’s designed for security from the start
-

FP: 3. as we discover how the webgpu spec is supposed to work
(constraints it runs into, etc.), having a language that this committee
owns that doesn’t require approval by another committee will give us
flexibility that we need.
-

CW: rebuttal:
-

1. view-source problem: we don’t need a standardized textual format
for this. can view webassembly on the web right now. can view spir-v
disassembled right now. no-one writes this. (CB: view source isn’t useful
in that case.)
-

DJ: the feedback we got from teams that write shaders is that they
want a human writeable format.
-

DJ: so you want to ship a spir-v compiler along with your source?
-

JG: then ship a compiler
-

CB: compromise: spir-v could well be the underlying implementation
of this. but if that’s all you spec, big q about ease-of-use of
programming. but you could get a rich diversity of compiler
languages, so
no sharing of code around the internet, defeating the purpose of a w3c
standard. we could do this and make a new platform for people
to make new
languages in. but best to have a lingua franca with a syntax that is
supported on every browser.
-

FP: i wasn’t describing the lack of view source. i’m describing
criticisms from developers about lack of view-source.
-

KN: understand. they want the originally written source code.
-

CB: they need to round-trip it.
-

2. spir-v logical addressing mode is a feature of spir-v by default
and it’s secure. theoretically you’ll be generating valid spir-v. can run
spir-v validator on it. it’s great to have a HLL to embed the security
properties, but it doesn’t make it better for ingesting by the API.
-

FP: is there a reference implementation enforcing the security
properties you suggest?
-

DN: the spec says exactly what logical addressing mode operands
can be.
-

FP: doesn’t say what the bounds check behavior is.
-

DN: are you asking for an implementation which checks validity of
program that’s running? or statically, plus a certain number
of runtime
checks?
-

Discussion with DN and FP about gluing SPIR-V spec to GL or Vulkan
spec
-

JG: super impressive that you’re creating a new language. disagree
on the need for it. we have 95% of a solution in front of us
in the form of
spir-v logical addressing mode. we already did this for
opengl and glsl for
webgl.
-

DJ: how many languages are compiled to SPIR-V logical addressing
mode?
-

HLSL. GLSL. OpenCL C subset.
-

CW: the reason to take SPIR-V is: high-level shading languages
have corner cases.
-

DJ: but the security researchers came in yesterday and showed us
bugs in SPIR-V drivers / compilers.
-

CB: he’s showing us how to add pointers and templates for limited
use cases. would like my ide to guide me along a path where
we implement
these new features robustly.
-

CW: perhaps we haven’t stressed enough that this is a great
experiment and something we have. we’d like to use this to
write shader
code. but it’s a question of what the api ingests.
-

FP: we are arguing that the api should ingest a textual format,
and that it should be designed from the ground up to meet the
needs of the
web, and that this format is something that’s owned by this
committee. no
matter what we pick there’ll be some friction between the
language and what
this committee is trying to do. the language has never been used by
anybody, no backward compatibility constraints. new language
is an asset.
also, saying 95% of spir-v is described is not true.
-

JG: there’s similar prior art with making GLSL secure.
-

FP: this could be viewed as an extension to GLSL.
-

CB: in this modern world with tons of github repos, it’s better to
have human-readable.
-

DJ: another useful feature: making a textual language easily
translatable to the lower level languages. all the platforms
are in the
same spot, requiring translation. and models the webgpu api
because we can
control e.g. the bindings. we’ve seen this in metal.
-

CW: yes it’s great to have debugging, and can have that by not
strippign all debug info from spir-v.
-

KR: Big pieces missing: analysis of current shaders and kernels
that will be ingested from the system, and, limitations on
the underlying
shading languages (no 8-bit load and stores). There are low-level
limitations that bubble up to the high-level language.
-

CB: ???
-

KR: Can’t come up with something from thin air that isn’t grounded
in the limitations of all the targets we have. We know we will need to
inject bounds check just like in WSL. Concern we are going to
throw out all
prior art, and all prior kernels (need HLSL to WSL).
-

FP: Not true currently the spec is a JAvascript interpreter. Then
compiler to SPIRV, then SPIRV to WSL. We will prove
isomorphism. We think
it is very grounded in prior art.
-

CB: Since it is isomorphic then it doesn’t prevent people from
reusing their kernels.
-

KR:
-

DJ: If we have isomorphism between WSL and SPIRV then HLSL and
GLSL work.
-

JG: Awful lot of work when things already work.
-

CB: Lot of people interested in pointer and templates. Think
improvements to the shading language is part of the features of WebGPU.
-

KR:
-

DJ: And without robust buffer access.
-

FP: Need to go, think there is a lot of info, need to take some
time. Provide slides and code.
-

FP: what it’s going to need to secure spir-v will be useful no
matter what we end up deciding.
-

DN: are you going to show your slides to your customers who
requested templating and see whether it’s what they want?
-

JG: one more request: could you choose a different name? WSL is
already a well-known name on Windows (Windows Subsystem for Linux).
-

BJ: clarification: you want textual format for the web. but webgl
developers have said they want binary shaders. is your plan to ship spir-v
and back-translate to WSL?
-

DJ: first one. also: why do they want a binary format? refuted by
disassembly arguments earlier.
-

DJ: also we have webcrypto which will really hide their content if they
want.
-

MM: it’s designed for the parsing and type system to be easily decidable.
-

KN: the web developers who say they want view-source are not the same as
the people who want to load large amounts of shaders quickly from the web,
compile quickly, avoid as much compilation time as possible. think loading
performance is better than view-source.
-

DJ: we agree. and we think WSL will compress well. we think the
compilation, parsing, loading time will be fast. the speed here is
important to us. if we took spir-v we’d still have to parse, translate,
convert to metal. spir-v won’t give us an advantage here.
-

CW: spir-v is designed to be the receiver of many languages so that you
can efficiently compile spir-v to native targets. if it’s isomorphic to
spir-v then why not ship a wasm module that compiles wsl to spir-v.
-

DJ: would be cool to see someone writing a shading language that has
these templates, etc. and compile to SPIR-V.
-

BJ: during vulkan development, was desire for bytecode languages. devs
pushed back because people said people would expect it to load fast but it
wouldn’t. the existence of spir-v is probably that there was a practical
benefit to it. is spir-v more quickly consumable?
-

DN: speed of loading was a non-goal. spir-v is intentionally high level
to avoid premature optimization.
-

CB: from a practical perspective: every language is a derivative of
clang and every IR is a derivative of LLVM IR. LLVM IR is probably one
point in the process of this compilation step and the language is likely to
be a cut-down version of clang, and the question is how far we cut it down.
would like developers to have the options.
-

DN: spir-v is deliberately distinct from LLVM. It was a mistake Khronos
made – twice – and a mistake multiple teams within Google have made – to
tie themselves to LLVM IR. When we designed the original SPIR, we
deliberately avoided basing it on LLVM IR.
-

CB: SPIR-V, DXIL, LLVM IR, etc are all pretty similar.
-

CB: need to decide whether we have a high-level, low-level, etc.
approach.

WSL / WebGPU Shading Language
https://cdn.rawgit.com/webkit/webkit/master/Tools/WebGPUShadingLanguageRI/index.html

MM: demo MM: This is a compiler that creates an AST and evaluates it by
visiting it in Javascript.
-

MM: If you are interested in WSL there is a live version of it.
-

CW: should clarify the discussion about shading languages into:
ergonomics, security, etc. and have different AIs for different people.
-

DN: wanted to ask FP to show preso to key customers. Think there may be
some resistance that the thing the customer wants (pointers) wasn’t
resolved by this proposal.
-

Depends on the customer.
-

DJ: ok, so we need to find out whether customer’s requests for pointers
and generics were satisfied by the WSL constraints. Might be talking with a
set of customers who are different from your (Google’s) set of customers.
-

DN: I heard a request from Filip about how much work there would be to
secure SPIR-V. We should take an AI to enumerate exactly what we mean from
secure. Namely, that access to buffers and images are checked. Robust
buffer access in a software implementation.
-

MM: if you want to use the SPIR-V spec you need to look at 2 specs.
-

CW: you have to look at the environment spec.
-

DJ: if we accept SPIR-V we need an environmental spec for WebGPU.
-

CW: that can be our (+ Mozilla’s) AI.
-

DJ: also, going to do prototype investigation in validating that SPIR-V
is secure
-

DN: namely, that you can make sure that ingested SPIR-V validates, and
that runtime checks are injected.
-

CW: need a validator, and a SPIR-V pass which adds bounds checks to
buffers, image fetches. These are super easy.
-

DJ: we will take the AI of cross-compiling WSL to and from SPIR-V and
MSL and will write down any snags we run into.
-

DJ: we were going to go around the room and did a straw poll.
-

CW: we should talk with Khronos about adopters’ fees for SPIR-V.
-

What are the implications of using SPIR-V in WebGPU?
-

DJ: seems fairly clear what Apple’s position is. We wouldn’t be working
on it otherwise. To reiterate goals: based on what we thought were the
requirements from the group and our own goals, it’s an investigation into a
solution. We think it’s valuable and the right thing to do. If the group is
really strongly pushing for SPIR-V, we want to know answers to questions.
-

DN: personal opinion: WSL is a great investigation to move the
conversation forward. We haven’t pinned down enough of our own requirements
to recommend in a convincing way what is required by the API. Also, if
we’re serious about an MVP, SPIR-V gets us a long way along quickly.
-

CW: strong intuition that SPIR-V is the right answer.
-

KN: aside from everything from security, don’t see a benefit of WSL over
SPIR-V but need more investigation.
-

DM: we should build and focus on the platform. From that point SPIR-V
makes more sense. Like the language, but would slow us down to build a new
high-level language.
-

JG: share Corentin’s intuition that SPIR-V is an efficient valuable way
forward esp. given the experience we gained from WebGL 1.0 and 2.0. Not
impressed with the motivation for starting something completely new, when
we have something that’s close to matching what we need. Surprised that
this was as contentious as it’s been.
-

RC: Chas already summarized. Agree with Apple that a textual language is
important for the web. It’s been a tenet of the web that all you need is a
text editor and web browser. Think we should use HLSL as the language. Chas
has been open to standardizing it with W3C. Have recently taken contribs
from SPIR-V group to have HLSL frontend, and getting SPIR-V folks access to
HLSL repo. In other words, HLSL isn’t just controlled by Microsoft. HLSL is
used by every Xbox game ever written, etc., and it’s been battle tested on
a large body of content. But we also want to see innovation in the language
space and think SPIR-V could be something WebGPU could reasonably accept.
So at the low level think it would be better to ingest SPIR-V instead of
DXIL.
-

KR: It was a very nice investigation and great motivation to make better
high-level languages. Think it is too early to throw out all previous
solutions, felt the presentation was dismissing prior art and core issues
that are in the low-level languages. We will have to have WSL running on
all platforms before we can choose it.
-

CB: Concern about breaking existing code with SPIRV? WSL is closer to
GLSL and HLSL than SPIRV.
-

KR: No concern that WSL would break things, SPIRV already ecosystem
to compiler from and to HLSL GLSL (and to MSL). NXT shows that SPIRV
translates well to HLSL GLSL and MSL. Why not have WSL to SPIRV
translator
early and have things running instead of writing many WSL
backends. On our
side we should write the “security layer” for SPIRV. Could put
that in NXT
then run tests on all paltforms.
-

CB: so put SPIRV validator in all browsers>
-

KR: Yes, WSL would be the same where you would have the compiler +
validator + translator in all browsers.
-

KR: Think it is premature to choose a new language that hasn’t run on
any GPU yet. +WSL is high level, our experience in WebGL is that native
GLSL compilers were all broken. Should go with something more battle tested
which is the SPIRV toolchain. Should choose SPIRV + look at security
constraints. Yes three.js will have to have a compiler to assemble glsl
shaders then translate to SPIRV, not sure how it will work, but we should
standardize a intermediate level language and maybe a high-level language
too in the browser (?)
-

ZM: Past couple years a third of our effort was working around compiler
bugs. Think intermediate format would help with this.
-

DJ: is the benefit because we think there will be fewer compiler bugs?
-

ZM: if we have a high level language we should have a standard
implementation that all browsers adopt.
-

BJ:
-

From the PoV of WebGL developers, having a textual representation
easily consumable by the browser is super important. Has enabled WebGL to
have its reach as people have been able to open dev tools and see shader
code being run. So having a blessed high-level language is good as most
shader code online would be in this language (the one for public
consumption like three.js, shadertoy etc.). However don’t care about the
exact high-level language. GLSL is preexisting and has benefits.
Mechanism
to ingest it in the browser doesn’t matter as long as it is
consistent. If
people have to bundle a bunch of WASM in Web page, it isn’t as good.
Interpreter in the browser?
-

Browser dev hat: no comment on the language itself. Have doubts about
amount of work people can put in making a language. We have access to
Khronos through contacts and W3C doesn’t have expertise in graphics. Just
defining the API is a big task. Saying we want to invent even more things
make the task even bigger. Concern about adding even more delay
to shipping
WebGPU. Is that acceptable?
-

SW: lack of high-level languages in which to write shaders is not a
problem. Hesitant to endorse something that will segment the web further
from desktop and mobile graphics development. Also, parsers are nasty
complicated things where lots of bugs turn up. The HLSL folks have dealt
with it on their side, so have the GLSL folks, don’t want to create a whole
bunch more parser bugs.
-
-

DJ: mentioned low level restrictions for some operations. That wouldn’t
be encountered by SPIRV program?
-

DN: Vulkan only permits 32bit or bigger load and stores. 16-bit
loads/stores are an extension. No 8-bit loads/stores.
-

MM: WSL must compile to that. So it will.
-

MM: Is there anything that you can’t do that isn’t listed in the
SPIRV spec.
-

DN: You want to look at SPIRV spec plus appendix A of Vulkan spec.
-

CW: Appendix A of the Vulkan spec.
-

DJ: not important to this group or tech: the current environment
(Vulkan/Spir-V) requires logical addressing mode. There’s a variable
pointer extension. That use case is more from the OpenCL community, right?
Will Vulkan change that environment to remove the restriction for logical
addressing mode?
-

DN: that’s a forward looking statement.
-

CW: doesn’t really matter, since we have to run on shipping hardware.
-

MM: but in 20 years?
-

DN: Vulkan was made in an environment with no new hardware features
except that which run current OpenGL. And SPIR-V was the way of specifying
shaders in this environment, so it’ll evolve.
-

CB: Want to point out I agree with Ken, and mess with him :P SPIR-V is
the de facto low level spec, HLSL the de facto high level spec. Want some
amount of standardization advancing of both.
-

CB: DXIL is an open-source github project. If there are things to change
in the language which could be made to make it more web friendly we are
happy to talk.
-

DN: concerns about HLSL: lack of spec, and based on behavior of previous
reference implementation. Know CB is going to address this. Hope the
situation is improved. We need that as well in the web context.
-

CB: you have the source. In some extent that’s less ambiguous than
anything written in the English language.
-

DN: also get unintended behavior. Some things done in HLSL shaders in
the wild until you compile to a low-level representation and do a bunch of
optimizations. Kind of a moving target. My team’s hitting that as well as
others. We’re all agreed that this needs to be improved. Reference
implementations have a lot of good properties but they also have bugs.
-

DJ: five companies. One with no preference. Google/Mozilla are saying
“accept SPIR-V after security analysis”, with some slight web developer hat
saying “source code is preferable”. MSFT/Apple say we want a human-readable
text format; difference is that Apple is coming with a different proposal
than MSFT.
-

DM: think there’s still space for high level language innovation like
Rust did, like enforcing aliasing rules at compile time. Would be happy to
do this as extensions later.

DOM Interactions

CW: how do we:
-

Put stuff on the canvas
-

Interactions with WebVR
-

Let’s not do workers. Dependent on what WebAssembly does (i.e., let’s
not do multithreading)
-

How to upload DOM elements
-

MM: why is this different from WebGL?
-

CW: one complaint: WebGL can only render to 1 canvas. If you wanted to
render to two, have to go through contortions
-

DJ: TL;DR: there are ways to do this in the web platform already. but
since we can present the render buffer in multiple places we can build a
better solution.
-

DJ: canvas.getContext(“”) works with one canvas. So we could make WebGPU
work with >1, or 0, canvases.
-

MM: think that’s a hard requirement to get one of these things without a
canvas.
-

KR: Some interactions with the Javascript interaction model, not
different from WebGL so we can defer that. People complain that WebGL is
its own thing outside of the rest of the DOM. Want to upload arbitrary DOM
elements.
-

JG: let’s focus on uploading same-origin DOM media elements.
-

MM: so, for now, no arbitrary DOM elements, and let’s see what WebGL
does.
-

CW: let’s start with Canvas and go right into WebVR.
-

DJ: one way to do this: make an instance of a WebGPU device. With
getContext you pass in that device. Then you’re not really talking to the
CanvasRenderingContext but something else.
-

JG: reminds me of ImageBitmapRenderingContext.
SwapChainRenderingContext? It’s a destination, but not the only
way to get
a WebGPU context.
-

Discussion about this
-

CW: if you allow putting any texture into a canvas. Unclear what the
browser does to put textures on the screen. Need to declare how you’re
gonna use the texture. Could get complicated if the canvas is in its own
layer or not, etc. Maybe ask canvas “give me a texture to render into”?
-

KN: with WebGL we render *into* the IOSurface.
-

DJ: keep the great ergonomics WebGL gave you so you don’t have a lot
of setup. Don’t need to allocate the depth buffer, etc.
-

BJ: agree, one main advantage of WebGL is “getContext” and start
drawing. Not requesting tons of pixel formats, etc. Same for media
elements; don’t need to allocate your own JPEG decoder, etc.
-

JG: those have value. Like the way where you look at the physical
devices / adapters, and see which one you want to use. Forces
the developer
to make some choice, but the worst thing about creating a new context is
“ChoosePixelFormat”.
-

CW: in all 3 APIs you don’t choose a pixel format. Maybe the canvas
tells you “here’s the format; deal with it”.
-

DJ: create a WebGPU Device. Then Canvas.getContext(). Gives you back
a CanvasRenderingContext. That’s the thing that gives you the SwapChain,
attach it to the device, and go.
-

JG: instead of a WebGPU context; have a SwapChainRenderingContext.
-

KN/JG: more discussion about this
-

MM: device is not actually a device in what Dean said. (Doesn’t refer
to a particular adapter in the system.) Somehow you’ll need to
get the root
object for the API.
-

Should be able to get that root object with no parameters.
-

KN: new WebGPUDevice().
-

JG: sure.
-

MM: agree that there’s some constructor that takes no arguments.
Other constraints too, but not for today.
-

DM: would need to pass in the queue created even earlier.
-

BJ: thinking through some feedback: question about SwapChains. Why
can’t you have a SwapChain be creating ImageBitmaps?
-

KN: don’t want to incur copy from ImageBitmap to screen. Want to
render into top-level thing given to DirectComposition,
CoreAnimation, etc.

WebVR

DJ: does this mesh with how WebVR works?
-

BJ: WebVR does not explicitly require WebGLRenderingContext. In upcoming
API, you create different layer types. There’s a WebGLLayer. Create it by
passing WebGLRenderingContext. You attach this to the session and say
“start presenting”. Would pass in WebGPU context (or, correction,
SwapChain).
-

Intent: with WebGLLayer, you ask it for a framebuffer to render into
every frame, so it’s effectively a SwapChain. Lets the underlying native
API provide the surface you render into.
-

Either that layer should act as a SwapChain, or point to a SwapChain
and provide the “next” surface to render into.
-

Need to make sure that SwapChain would potentially be populated by
surfaces coming from the native VR APIs.
-

MM: want VR to be a supported use case.
-

(All agree.)
-

BJ: WebVR’s designed in a way so that you’re expected to have completed
your rendering by the end of the callback that gave you the pose. Given
nature of WebGPU API where there’s a lot of asynchrony, unlike WebGL, it’ll
make things more difficult for developers.
-

But if they can maintain a double-buffer of resources and prep
everything before your next callback, you can get everything done.
-

DJ: some of this will be educating developers.
-

BJ: there are patterns from WebGL that wouldn’t work.
-

DJ: we could have an explicit “PresentSwapChain” API
-

DJ: could ensure in our API that nothing’s going to block and take a
long time. Developer has to be aware things will be asynchronous. Will have
to set things up in advance.
-

BJ: think we won’t have an explicit “Submit” or “Present” API. Asked web
platform leads, was shot down.
-

BJ: we also have an explicit “requestFrame”. Can do all the prep, wait
for fences/barriers, then call requestFrame.
-

BJ: requestFrame syncs with the headset’s sync loop. 90 Hz instead of 60
Hz.
-

BJ: feel pretty comfortable it will work, will require a mindset change.
-

KN: how are we going to upload the pose? Need a synchronous upload of
the pose data.
-

BJ: array of view matrices + array of projection matrices. Usually 1 or
2 of each. Maybe more for lightfield displays.
-

BJ: if I can take 64 floats and make them available inline before the
draw call that would be sufficient.
-

CW: there will be a way to do uploads. But for sure there’s a way to
update a uniform buffer with data. Don’t worry. We don’t know the exact
mechanism yet, but it will exist.
-

BJ: good. It’s a hard requirement that we can communicate the pose
synchronously with respect to the current frame.
-

MM: so we need it to be communicated to the draw call be done.
-

MM: doesn’t need to flush. No round-trip.
-

CW: without blocking there’s a way to provide data to the GPU.
-

KN: staging buffer or similar.
-

CW: WebVR ideally gives you a texture array and you render to one layer
and then the other.
-

BJ: yes, ideally. If support’s there consistently then if you use WebGPU
you *always* render to a texture array.
-

CW: all APIs do support texture arrays, so we can require that be the
mechanism.
-

BJ: won’t affect many people. Will make rendering more efficiently.
Don’t have to have connections to the current limitations of WebGL
interacting with WebVR.
-

KN: is it possible that swapchain of native system will be designed for
side-by-side rendering?
-

BJ: that’s the best way to interface with Daydream right now. But by the
time WebGPU comes out it’ll probably have been moved forward. Also we can
probably do a blit at the very end of the pipeline. And if that puts it at
a disadvantage then we should fix Daydream.

Upload from dom media elements

KR: Let’s learn from our mistakes. In WebGL turns out there are some
sync operations that happen in some cases. For HTMLImageElement synchronous
decode needs to happen. For HTMLVideoElement HW and SW path conflated but
it prevents some 0copy. HTMLCanvasElement needs GPU to GPU copies.
ImageBitmap from HTMLImageElement give you the data ready for consumption
by the GPU
-

KR: Suggest we force uploading from HTMLImageElement and require
ImageBitmap instead.
-

DJ: Could use the decode() function on HTMLImageElement.
-

KR: does that take extra arguments like flipY unmultiplyAlpha etc. Don’t
think image element has it.
-

KR: For WebGPU there is not state for “pixel state”. Not sure about
flipping Y. Will need to deal with this stuff in WebGPU and figure out how
things will interact. Suggest ImageBitmap is the only way to upload images.
For video elements suggest we do something like the “live update” mechanism
LG is working on for WebGL. =
-

DJ: What if I want to keep the frame while the video is playing?
-

KN: LG’s thing is 0copy. You can make a copy if you want a fixed image?
-

KR: Need to support HW and SW paths. HW like a texture source. SW give
data, copy in a buffer then upload to texture? Basically we need to try to
get the video decode path with as little copies as possible.
-

DM: How important is video?
-

All: very important.
-

KR: HTML canvas, maybe do an image bitmap from it?
-

DJ: Like only 2 entry points: image bitmaps and video. How do Image
bitmap work with compressed textures?
-

RC: You can’t make one from an image or a video, you need to do an
upload.
-

DJ: So we need a way to upload from ArrayBuffer?
-

DJ: thinking more of the WebGL case, where the only way you can upload a
DOM element is via TexImage2D. That’s why you need the ArrayBuffer entry
point. But in WebGPU you’re going to upload to buffers that aren’t
necessarily images.
-

RC: asking can we upload an image to a vertex buffer?
-

CW: upload to buffer or to compressed format?
-

DJ/MM: want to upload raw compressed bytes (ETC, DXT, etc.)
-

CW: you’d need a query mechanism to know the supported compressed
texture formats.
-

Some confusion about how WebGL handles compressed textures.
-

MM: so, two entry points for uploading to textures from DOM.
-

One accepts ImageBitmap.
-

The other accepts HTMLVideoElement.
-

KR: we need to separately consider the software and hardware cases
for HTMLVideoElement.
-

CW/JG: if you have raw data, you MapBuffer/copy data into
buffer/UnmapBuffer.
-

DJ: it’s a bit more code. creating ImageBitmap returns a Promise.
-

MM/DJ: the “one line” in current WebGL samples leaves synchronous
blocking.
-

DJ: you don’t want a wait in your WebVR rendering callback.

Received on Tuesday, 26 September 2017 13:55:29 UTC