Minutes for the 2017-10-25 meeting from Corentin Wallez on 2017-10-30 (public-gpu@w3.org from October 2017)

From: Corentin Wallez <cwallez@google.com>
Date: Mon, 30 Oct 2017 15:46:58 -0400
To: public-gpu <public-gpu@w3.org>
Message-ID: <CAGdfWNNSLcD-=8xSh3yUQSxCD-1seQZRZv-5kef4xc60zSopUA@mail.gmail.com>

GPU Web 2017-10-25

Chair: Corentin

Scribe: Ken

Location: Google Hangout
Minutes from last meeting
<https://docs.google.com/document/d/1N_chcF7HscK_ZaiNEGqzYR9JUgRCaxtsbE0p3wXa4Jw/>
TL;DR

Status updates
-

Apple: to ideas, making a prototype library on top of Vulkan, soon
Metal
-

Google: SPIR-V UB investigation, shaderc in WASM and updating
nxt-chromium
-

Mozilla: Refactoring internals of gfx-rs
-

SPIR-V undefined behaviors (https://github.com/gpuweb/gpuweb/issues/34):
-

Important to distinguish undefined behavior and undefined values.
-

SPIR-V has few undefined behaviors, a bit more undefined values in
math builtins.
-

Concern about mismatched typed loads: logical addressing mode forbifs
them
-

Trap behavior for shader (https://github.com/gpuweb/gpuweb/issues/35)
-

Benchmarks show trap and clamping are close
-

Concern that trap would be more expensive in a deeply nested call
stack. Concern that trapping doesn’t make use of hardware robust resource
access.
-

Devs would like a trapping mechanism for debugging UB.
-

Consensus that deciding exact behavior should be deferred post-MVP.
-

Binary vs. text (or low vs. high level)
-

View source: discussion of whether view-source is important today
with WASM and JS minifiers, and how readable SPIR-V can be.
-

Optimization opportunities: discussion whether high-level would allow
more optimizations and that SPIR-V is still high level.
-

Interoperability:
-

Discussion that a problem with GLSL/HLSL is that it is hard to
write interoperable implementations (ANGLE doesn’t help).
-

Suggestion that SPIR-V helps because it is much simpler and
tighter.
-

Discussion that having a monoculture on a library would help
interoperability (like ANGLE) but is bad for the Web in general.

Tentative agenda

Administrative stuff (if any)

Individual design and prototype status

SPIR-V UB
-

Shader “trap” mechanism
-

Binary vs. text (or low vs. high level)
-

Agenda for next meeting

Attendance

Apple

Dean Jackson

JF Bastien
-

Myles C. Maxfield
-

Google
-

Corentin Wallez
-

David Neto
-

Kai Ninomiya
-

Ken Russell
-

Ricardo Cabello
-

Microsoft
-

Chas Boyd
-

Rafael Cintron
-

Mozilla
-

Dzmitry Malyshau
-

Jeff Gilbert
-

Yandex
-

Kirill Dmitrenko
-

ZSpace
-

Doug Twilleager
-

Joshua Groves
-

Markus Siglreithmaier
-

Tyler Larson

Administrative items

DJ: Haven’t heard back from the W3C about waiving the registration fees
for a TPAC meeting

Individual design and prototype status

Apple:
-

MM:haven’t done any work on the shading language impl. Have been
working on a demonstration of an API that would have the form we’re most
comfortable with.
-

Automatic barrier insertion, …
-

At a place comfortable showing it to the world
-

MM: going to upload it to the WebKit repo today
-

Just needed to write it to make sure it’s implementable on Vulkan
-

CW: is it a standalone API?
-

MM: it’s a standalone library. C++ API, not JavaScript API. Not a
proposal for what to ship; just a demonstration.
-

Implemented on top of Vulkan. Will implement on top of Metal in the
coming weeks. Going to work on shading language next.
-

Google:
-

CW: investigated undefined behavior in SPIR-V.
-

Kai has compiled shaderc to WebAssembly
-

KN: for example, can turn GLSL into SPIR-V from the web
-

Corentin has worked on NXT and Chromium prototype integration
-

Microsoft
-

RC: no updates
-

Mozilla:
-

DM: internal refactoring about queue factories and interaction with
command pools
-

Changed how binds to Metal C APIs; autorelease, etc.
-

Updated our WebGPU prototype to newest gfx-rs hardware abstraction
layer (HAL): links to various repos already posted.

SPIR-V UB

https://github.com/gpuweb/gpuweb/issues/34

CW: think we understand the constraints of robust buffer access
-

Not looking into data races in shader execution
-

Not looking into mathematical precision of various operations
-

CW: aside from that there are a few examples of undefined behavior in
SPIR-V
-

There’s an operator to create an “unspecified result” of an operation
-

JG: important to distinguish undefined behavior and undefined values.
-

JG: think it’s not safe. Usually talking about lack of safety of
undefined behavior, up to and including program termination. Don’t conflate
with taking a differing path through the program due to differing values
being computed.
-

CW: correct. This is an undefined value, think we should remove it
because we can. But not as important as other kinds.
-

CW: Kinds of undefined behaviors in SPIR-V:
-

Create a variable without an initializer. Can forbid this.
-

Indexing vectors with arbitrary indices; e.g. vec4 with arbitrary
indices. These could compromise security. Solution: provide
robust resource
kinds of guarantees.
-

Some kinds of math operations, in particular, that return undefined
values for certain inputs.
-

Can choose to do whatever. More undefined value than undefined
behavior.
-

CW: conclusion: not a lot to be done to fix up these undefined behaviors.
-

JFB: question about things like division by zero. Allowed to terminate
the program?
-

CW: can not terminate the program. The value is undefined.
-

JFB: group needs to decide whether undefined values are OK, or terminate
the program. Need to define whether it’s OK to only sometimes terminate.
-

CW: undefined behavior we need to shield the user from for sure.
Undefined values it’s not 100% clear.
-

MM: couldn’t find description in the SPIR-V spec of what happens if you
store one type and load a different type.
-

KN: pointers are typed and you can’t do that statically.
-

CW/DN: if your memory was typed as vec3 arrays and you tried to get a
pointer to the fourth component, that would be out of bounds and should
have been clamped by the robust buffer access.
-

MM: talking about single Store/Load instructions.
-

DN: can’t change the pointee type.
-

CW: part of the logical addressing mode. Can only point to smaller,
but still whole, elements of a structure.
-

DN: logical addressing mode states how pointers can be created.
Pointer-cast operator. Load/store talk about matching pointee
type to value
type.
-

MM: can you link to validation rules?
-

DN: sure.
-

Logical addressing mode rules:
https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html#_universal_validation_rules
-

Specifically it bans (by omission) a cast between pointer types.
-

Also, in the GLSL or Simple memory model (OpMemoryModel
instruction), aliasing is disallowed by default. Reference
https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html#_a_id_aliasingsection_a_aliasing
-

Therefore all OpVariables reference memory which is not
aliased. I.e. storage backing variables is disjoint.
This is arranged by
the implementation.
-

So memory is strongly typed.
-

The OpLoad instruction:
https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html#OpLoad
hsa validation rule: “Pointer is the pointer to load through.
Its type must
be an OpTypePointer whose Type operand is the same as Result Type.”
-

The OpStore instruction:
https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html#OpStore
has the validation rule “Pointer is the pointer to store
through. Its type
must be an OpTypePointer whose Type operand is the same as the type of
Object.”
-

RC: in one previous conversation Myles brought up an issue. What’s the
status?
-

CW: that was resolved to be a spec bug and has been fixed. Will be
pushed out.

Shader “trap” mechanism

https://github.com/gpuweb/gpuweb/issues/35

MM: would like to clarify some points:
-

Not wedded to any particular solution.
-

Think we can all agree portability is valuable. Perhaps we have
differing degrees of how far we want to go.
-

If we can get both portability and performance, hopefully we can all
agree on it.
-

(Either clamping or trapping.)
-

If in some cases one is better and in some the other is better, can
have the discussion of whether one or the other is allowed.
-

Tried running some little benchmark programs.
-

Ran on slowest piece of hardware I (Myles) could find.
-

In every case we tried the trap solution, (basically an if-statement and
return), that ended up being faster.
-

If the tests are not representative, please suggest how they could be
improved.
-

JG: are you testing valid or invalid data?
-

MM: valid.
-

CW: we don’t care so much about invalid operations.
-

CW: looking at the numbers, clamping vs. trapping seems fairly close.
-

KD: let’s look at the code first. Trap should be significantly faster.
First counter that goes out of bounds, shader returns. Clamping will run
all iterations. Why are the numbers even close?
-

MM: was only testing valid data. None of the early returns were ever
hit.
-

KD: so for example all of the buffers were big enough?
-

CW: clamp and trap look very close for a trap that’s inside a top-level
function. If you trap inside a deeply nested call stack how do you return
from it?
-

MM / CW: check flags that are set.
-

CW: while benchmark shows that trapping is faster than clamping
-

JG: if we allow either clamping or trapping, we can choose which one we
want.
-

MM: if we can find that one solution is always or almost always better,
we can get some portability for free.
-

CW: trapping on all undefined behaviors would indeed be a portability
win.
-

JG: we don’t necessarily need the data in order to move forward, unless
we want to standardize on one solution.
-

CW: if we can settle on only one behavior, it’s a portability win. If
not, allow either.
-

JG: portability win for bad programs. Not as important as maintaining
correctness.
-

KD: trap mechanism would probably also allow us to record where the trap
occurred and return that information to the developer in debug mode.
-

CW: yes. But that could be a debug mode where the shaders are
instrumented. Probably should not be done in production.
-

DM: trap solution would not allow us to use robust buffer access on
other APIs. Would be nice to get better performance.
-

RC: tend to agree with Dzmitry on this. “The fastest code is the code
which doesn’t run.”
-

MM: need to benchmark against hardware that has robust buffer access.
-

CW: will be interesting to choose one or the other for V1. May be a
rabbit hole for MVP. Don’t want to send Myles off on a fruitless
benchmarking exercise. Defer post MVP?
-

MM: the issue Apple is concerned with is portability in general. Think
this is important for MVP. Still, happy to defer this one point.
-

CW: Apple’s work on shading languages might be more important at this
point.
-

KD: benchmarking results might be very different on different hardware.
Also, in ~few years maybe all hardware will support robust buffer access.
-

MM: quick rebuttal: we can’t design APIs for hardware that doesn’t exist.
-

DM: can’t disable robust resource access on D3D12 for example.
-

CW: true for vertex/index buffers. Not sure about UAVs.
-

RC: what’s not checked is accesses off the end of the root descriptor
tables. We’d need to check those in WebGPU. Prefer to let the API do the
checks for you.
-

CB: plan is to loosen up these restrictions in future shading languages,
but agree that for web standards it’s important to rely on the hardware’s
support.

Binary vs. text (or low vs. high level)

CW / KN: let’s go through Kai’s email and go through the things we can
agree upon.
-

KN: probably a few points of agreement:
-

Compatibility with current content/examples using HLSL/GLSL
-

Probably don’t want to spend the time writing an entirely new thing
before we can do anything else
-

However, perhaps modifications to HLSL can be done quickly
-

MM: full backwards compatibility is not a requirement for either SPIR-V
or HLSL.
-

KN: agree. But most math that people copy/paste from web examples should
work fine.
-

KN: not sure what else we all agree on :)
-

KN: start with view-source?
-

WebAssembly folks have given the feedback that a lot of vocal web
developers complain you can’t view the source of a WebAssembly program.
-

Thinks maybe this is an issue with WebAssembly.
-

Web graphics developers are more concerned with performance
-

So if there is a tradeoff, should probably choose performance.
-

CW: does anyone expect to be able to download a WebAssembly module
and see the original sources?
-

KN: no, but they “prefer” JavaScript because they can see the sources.
-

DJ: developers of WebAssembly found they needed a human readable
source even just for tests.
-

DJ: “more concerned with performance” – does that also mean
downloading the shader?
-

KN: yes. Download, compile, etc. If there’s a tradeoff there then
think we have to take the performant option.
-

DJ: what about if it’s much faster on one platform? for example one
that ships an HLSL compiler?
-

CW: different topic than view-source.
-

DJ: talking about performance.
-

KN: think we basically mean, on all platforms.
-

CB: two of these options might have different characteristics. For
example, first frame performance vs. higher framerate.
-

CW: not sure why one would have a higher frame rate.
-

CB: sometimes have a platform-specific compiler optimization.
-

DJ: we’ve found that sometimes we can optimize JavaScript better than
the lower-level IR. Can do some optimizations because we understand more
about the program.
-

CB: we hear this from driver vendors too.
-

CW: is it a goal for the NVIDIA driver to optimize Unreal Engine’s
shaders on the web?
-

CW: let’s talk about optimization opportunities.
-

SPIR-V vs. HLSL: HLSL is higher-level, therefore more optimization
opportunities in backends.
-

CW disagrees.
-

KN: my second point was going to be: SPIR-V is not that low level.
-

1. It’s more view-source-able than WebAssembly
-

2. As far as we know it’s not any less optimizable than a high-level
language.
-

Would be beneficial for some people who are arguing against low-level
to take another look at SPIR-V.
-

Also would be valuable to show decompiling from SPIR-V to GLSL/HLSL.
It’s intended for semantics-preserving decompilation assuming the debug
info is there.
-

KN’s two options in the email were IR vs. SL. Argument is that they’re
pretty much both at the same “level” i.e., “high-level”.
-

Tool does this conversion pretty well. Even if no debug info.
Preserves structure like loops, function calls, etc. Modulo inlining.
-

JG: that’s the sort of code you’ll ship anyway. We effectively don’t
have debug info for JavaScript on the web today because everyone minimizes
their code. All names, context etc. is stripped. So SPIR-V would be able to
reconstruct at the same basic level as JavaScript view-source on the web
today.
-

CW/KN: agree
-

Example tool is SPIRV-Cross, with outputs in GLSL, HLSL,
(experimental) MSL
-

https://github.com/KhronosGroup/SPIRV-Cross
-

Example outputs are in its “reference” test set:
https://github.com/KhronosGroup/SPIRV-Cross/tree/master/reference
-

E.g. GLSL input
https://github.com/KhronosGroup/SPIRV-Cross/blob/master/shaders/vert/ocean.vert
-

Becomes GLSL output
https://github.com/KhronosGroup/SPIRV-Cross/blob/master/reference/shaders/vert/ocean.vert
-

MM: saying that it doesn’t work in JavaScript doesn’t mean that
view-source is useless.
-

JG/KN: agree.
-

But there’s no strong pressure to do better than we already have.
-

DJ: you could argue that if it’s what people are shipping and what
people want then we should just ship that format.
-

CW: that’s fair, but that’s not necessarily what people want.
WebAssembly is a tradeoff based on size/performance/etc. Argue that IR is
better. Every GLSL compiler has bugs. Half of Corentin’s job has been to
write compiler passes to work around GLSL compiler bugs. And this is a
language that has a working group, conformance tests, and reference
implementation.
-

SPIR-V is well defined, clear spec. Easy to make interoperable
implementations from it. And that’s what the web is about:
interoperability.
-

DJ: it’s a great point. One of the points of accepting SPIR-V would
be accepting the SPIR-V compiler itself.
-

JG: SPIR-V being a binary format and having a well-defined spec makes
it a better ingestion target. The funnel of what we ingest will
be smaller.
-

CB: you’re going to take the SPIR-V source and compile it into all
browsers?
-

JG: yes.
-

CB: so the interoperable behavior will be enforced by compiling the
same SPIR-V compiler into browsers?
-

JG: would like to avoid having a monoculture.
-

MM: why will not SPIR-V have the same problems as high-level languages?
-

CW: because things are better defined in the binary format. Things like
initializer expressions (or lack thereof) in HLSL. Scoping rules in if and
switch statements.
-

JG: Some things are easier to parse than others. For example C has a
monstrous grammar and is extremely is to make errors in.
-

DJ: it’s not our job just to make our lives easier. It’s our job to make
web developers’ lives easier.
-

JG: agree. But it’ll be easier to follow through on our promises to
developers if we choose an easier compilation target.
-

KN: agree. We’ll reach something sufficiently stable sooner, and maybe,
ever. If the project’s simpler, it’s more likely that we’ll get it right.
-

CB: it seems that the way people are getting uniform behavior is that
everyone uses ANGLE.
-

CW: ANGLE has tons of bugs too. Have been fuzzing it.
-

JG: not having a monoculture in WebGL – multiple implementations – has
pushed everyone forward. Has materially pushed ANGLE forward too.
-

CB: but what if we’ve found and fixed all of the bugs in ANGLE?
-

CW: what about multiple versions of ANGLE in different browsers?
-

CB: not sure what expected release cadence is.
-

CW: based on the releases of the browsers.
-

JG: working on 6-week development cycle.
-

DJ: we should follow up on Kai’s email. Also a Github issue about
accepting HLSL as a proposal.
-

CW: putting the cart before the horse.
-

JG: we’ll discuss on the mailing list.
-

DJ: would like to see what developers would prefer. This is as important
to me as view-source.
-

CW: let’s discuss on the mailing list.

Agenda for next meeting

Been stuck on memory barriers for a while?
-

JG / MM: “anything else” than this.
-

MM: one interesting bit when Myles publishes Apple’s API today to
WebKit: how to do buffer and texture updates. Has to be some way to
schedule a buffer update (from CPU to GPU) so that it doesn’t get updated
while it’s in use by the GPU.
-

CW: let’s talk about data upload/download. Something we talked about a
while ago. Keep it light.
-

RC: think we should continue this discussion too in the future though.
Had some things to day.
-

MM: agree, can’t ship anything until this is resolved.
-

CW: memory barriers and shading languages are two big topics. Can’t ship
without resolution.

Received on Monday, 30 October 2017 19:47:50 UTC