Minutes for the 2017-10-25 meeting

GPU Web 2017-10-25

Chair: Corentin

Scribe: Ken

Location: Google Hangout
Minutes from last meeting
<https://docs.google.com/document/d/1N_chcF7HscK_ZaiNEGqzYR9JUgRCaxtsbE0p3wXa4Jw/>
TL;DR

   -

   Status updates
   -

      Apple: to ideas, making a prototype library on top of Vulkan, soon
      Metal
      -

      Google: SPIR-V UB investigation, shaderc in WASM and updating
      nxt-chromium
      -

      Mozilla: Refactoring internals of gfx-rs
      -

   SPIR-V undefined behaviors (https://github.com/gpuweb/gpuweb/issues/34):
   -

      Important to distinguish undefined behavior and undefined values.
      -

      SPIR-V has few undefined behaviors, a bit more undefined values in
      math builtins.
      -

      Concern about mismatched typed loads: logical addressing mode forbifs
      them
      -

   Trap behavior for shader (https://github.com/gpuweb/gpuweb/issues/35)
   -

      Benchmarks show trap and clamping are close
      -

      Concern that trap would be more expensive in a deeply nested call
      stack. Concern that trapping doesn’t make use of hardware robust resource
      access.
      -

      Devs would like a trapping mechanism for debugging UB.
      -

      Consensus that deciding exact behavior should be deferred post-MVP.
      -

   Binary vs. text (or low vs. high level)
   -

      View source: discussion of whether view-source is important today
      with WASM and JS minifiers, and how readable SPIR-V can be.
      -

      Optimization opportunities: discussion whether high-level would allow
      more optimizations and that SPIR-V is still high level.
      -

      Interoperability:
      -

         Discussion that a problem with GLSL/HLSL is that it is hard to
         write interoperable implementations (ANGLE doesn’t help).
         -

         Suggestion that SPIR-V helps because it is much simpler and
         tighter.
         -

         Discussion that having a monoculture on a library would help
         interoperability (like ANGLE) but is bad for the Web in general.

Tentative agenda

   -

   Administrative stuff (if any)


   -

   Individual design and prototype status


   -

   SPIR-V UB
   -

   Shader “trap” mechanism
   -

   Binary vs. text (or low vs. high level)
   -

   Agenda for next meeting

Attendance

   -

   Apple


   -

   Dean Jackson


   -

   JF Bastien
   -

   Myles C. Maxfield
   -

   Google
   -

      Corentin Wallez
      -

      David Neto
      -

      Kai Ninomiya
      -

      Ken Russell
      -

      Ricardo Cabello
      -

   Microsoft
   -

      Chas Boyd
      -

      Rafael Cintron
      -

   Mozilla
   -

      Dzmitry Malyshau
      -

      Jeff Gilbert
      -

   Yandex
   -

      Kirill Dmitrenko
      -

   ZSpace
   -

      Doug Twilleager
      -

   Joshua Groves
   -

   Markus Siglreithmaier
   -

   Tyler Larson

Administrative items

   -

   DJ: Haven’t heard back from the W3C about waiving the registration fees
   for a TPAC meeting

Individual design and prototype status

   -

   Apple:
   -

      MM:haven’t done any work on the shading language impl. Have been
      working on a demonstration of an API that would have the form we’re most
      comfortable with.
      -

      Automatic barrier insertion, …
      -

      At a place comfortable showing it to the world
      -

      MM: going to upload it to the WebKit repo today
      -

      Just needed to write it to make sure it’s implementable on Vulkan
      -

      CW: is it a standalone API?
      -

      MM: it’s a standalone library. C++ API, not JavaScript API. Not a
      proposal for what to ship; just a demonstration.
      -

      Implemented on top of Vulkan. Will implement on top of Metal in the
      coming weeks. Going to work on shading language next.
      -

   Google:
   -

      CW: investigated undefined behavior in SPIR-V.
      -

      Kai has compiled shaderc to WebAssembly
      -

      KN: for example, can turn GLSL into SPIR-V from the web
      -

      Corentin has worked on NXT and Chromium prototype integration
      -

   Microsoft
   -

      RC: no updates
      -

   Mozilla:
   -

      DM: internal refactoring about queue factories and interaction with
      command pools
      -

      Changed how binds to Metal C APIs; autorelease, etc.
      -

      Updated our WebGPU prototype to newest gfx-rs hardware abstraction
      layer (HAL): links to various repos already posted.

SPIR-V UB

https://github.com/gpuweb/gpuweb/issues/34

   -

   CW: think we understand the constraints of robust buffer access
   -

      Not looking into data races in shader execution
      -

      Not looking into mathematical precision of various operations
      -

   CW: aside from that there are a few examples of undefined behavior in
   SPIR-V
   -

      There’s an operator to create an “unspecified result” of an operation
      -

   JG: important to distinguish undefined behavior and undefined values.
   -

   JG: think it’s not safe. Usually talking about lack of safety of
   undefined behavior, up to and including program termination. Don’t conflate
   with taking a differing path through the program due to differing values
   being computed.
   -

   CW: correct. This is an undefined value, think we should remove it
   because we can. But not as important as other kinds.
   -

   CW: Kinds of undefined behaviors in SPIR-V:
   -

      Create a variable without an initializer. Can forbid this.
      -

      Indexing vectors with arbitrary indices; e.g. vec4 with arbitrary
      indices. These could compromise security. Solution: provide
robust resource
      kinds of guarantees.
      -

      Some kinds of math operations, in particular, that return undefined
      values for certain inputs.
      -

         Can choose to do whatever. More undefined value than undefined
         behavior.
         -

   CW: conclusion: not a lot to be done to fix up these undefined behaviors.
   -

   JFB: question about things like division by zero. Allowed to terminate
   the program?
   -

      CW: can not terminate the program. The value is undefined.
      -

   JFB: group needs to decide whether undefined values are OK, or terminate
   the program. Need to define whether it’s OK to only sometimes terminate.
   -

   CW: undefined behavior we need to shield the user from for sure.
   Undefined values it’s not 100% clear.
   -

   MM: couldn’t find description in the SPIR-V spec of what happens if you
   store one type and load a different type.
   -

      KN: pointers are typed and you can’t do that statically.
      -

      CW/DN: if your memory was typed as vec3 arrays and you tried to get a
      pointer to the fourth component, that would be out of bounds and should
      have been clamped by the robust buffer access.
      -

      MM: talking about single Store/Load instructions.
      -

      DN: can’t change the pointee type.
      -

      CW: part of the logical addressing mode. Can only point to smaller,
      but still whole, elements of a structure.
      -

      DN: logical addressing mode states how pointers can be created.
      Pointer-cast operator. Load/store talk about matching pointee
type to value
      type.
      -

      MM: can you link to validation rules?
      -

      DN: sure.
      -

         Logical addressing mode rules:
         https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html#_universal_validation_rules
         -

            Specifically it bans (by omission) a cast between pointer types.
            -

            Also, in the GLSL or Simple memory model (OpMemoryModel
            instruction), aliasing is disallowed by default.  Reference
            https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html#_a_id_aliasingsection_a_aliasing
            -

               Therefore all OpVariables reference memory which is not
               aliased.  I.e. storage backing variables is disjoint.
This is arranged by
               the implementation.
               -

            So memory is strongly typed.
            -

         The OpLoad instruction:
         https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html#OpLoad
         hsa validation rule: “Pointer is the pointer to load through.
Its type must
         be an OpTypePointer whose Type operand is the same as Result Type.”
         -

         The OpStore instruction:
         https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html#OpStore
         has the validation rule “Pointer is the pointer to store
through. Its type
         must be an OpTypePointer whose Type operand is the same as the type of
         Object.”
         -

   RC: in one previous conversation Myles brought up an issue. What’s the
   status?
   -

      CW: that was resolved to be a spec bug and has been fixed. Will be
      pushed out.

Shader “trap” mechanism

https://github.com/gpuweb/gpuweb/issues/35

   -

   MM: would like to clarify some points:
   -

      Not wedded to any particular solution.
      -

      Think we can all agree portability is valuable. Perhaps we have
      differing degrees of how far we want to go.
      -

      If we can get both portability and performance, hopefully we can all
      agree on it.
      -

      (Either clamping or trapping.)
      -

      If in some cases one is better and in some the other is better, can
      have the discussion of whether one or the other is allowed.
      -

   Tried running some little benchmark programs.
   -

   Ran on slowest piece of hardware I (Myles) could find.
   -

   In every case we tried the trap solution, (basically an if-statement and
   return), that ended up being faster.
   -

   If the tests are not representative, please suggest how they could be
   improved.
   -

   JG: are you testing valid or invalid data?
   -

      MM: valid.
      -

      CW: we don’t care so much about invalid operations.
      -

   CW: looking at the numbers, clamping vs. trapping seems fairly close.
   -

   KD: let’s look at the code first. Trap should be significantly faster.
   First counter that goes out of bounds, shader returns. Clamping will run
   all iterations. Why are the numbers even close?
   -

      MM: was only testing valid data. None of the early returns were ever
      hit.
      -

      KD: so for example all of the buffers were big enough?
      -

   CW: clamp and trap look very close for a trap that’s inside a top-level
   function. If you trap inside a deeply nested call stack how do you return
   from it?
   -

      MM / CW: check flags that are set.
      -

      CW: while benchmark shows that trapping is faster than clamping
      -

   JG: if we allow either clamping or trapping, we can choose which one we
   want.
   -

   MM: if we can find that one solution is always or almost always better,
   we can get some portability for free.
   -

   CW: trapping on all undefined behaviors would indeed be a portability
   win.
   -

   JG: we don’t necessarily need the data in order to move forward, unless
   we want to standardize on one solution.
   -

   CW: if we can settle on only one behavior, it’s a portability win. If
   not, allow either.
   -

   JG: portability win for bad programs. Not as important as maintaining
   correctness.
   -

   KD: trap mechanism would probably also allow us to record where the trap
   occurred and return that information to the developer in debug mode.
   -

   CW: yes. But that could be a debug mode where the shaders are
   instrumented. Probably should not be done in production.
   -

   DM: trap solution would not allow us to use robust buffer access on
   other APIs. Would be nice to get better performance.
   -

   RC: tend to agree with Dzmitry on this. “The fastest code is the code
   which doesn’t run.”
   -

   MM: need to benchmark against hardware that has robust buffer access.
   -

   CW: will be interesting to choose one or the other for V1. May be a
   rabbit hole for MVP. Don’t want to send Myles off on a fruitless
   benchmarking exercise. Defer post MVP?
   -

   MM: the issue Apple is concerned with is portability in general. Think
   this is important for MVP. Still, happy to defer this one point.
   -

   CW: Apple’s work on shading languages might be more important at this
   point.
   -

   KD: benchmarking results might be very different on different hardware.
   Also, in ~few years maybe all hardware will support robust buffer access.
   -

   MM: quick rebuttal: we can’t design APIs for hardware that doesn’t exist.
   -

   DM: can’t disable robust resource access on D3D12 for example.
   -

   CW: true for vertex/index buffers. Not sure about UAVs.
   -

      RC: what’s not checked is accesses off the end of the root descriptor
      tables. We’d need to check those in WebGPU. Prefer to let the API do the
      checks for you.
      -

   CB: plan is to loosen up these restrictions in future shading languages,
   but agree that for web standards it’s important to rely on the hardware’s
   support.

Binary vs. text (or low vs. high level)

   -

   CW / KN: let’s go through Kai’s email and go through the things we can
   agree upon.
   -

   KN: probably a few points of agreement:
   -

      Compatibility with current content/examples using HLSL/GLSL
      -

      Probably don’t want to spend the time writing an entirely new thing
      before we can do anything else
      -

      However, perhaps modifications to HLSL can be done quickly
      -

   MM: full backwards compatibility is not a requirement for either SPIR-V
   or HLSL.
   -

   KN: agree. But most math that people copy/paste from web examples should
   work fine.
   -

   KN: not sure what else we all agree on :)
   -

   KN: start with view-source?
   -

      WebAssembly folks have given the feedback that a lot of vocal web
      developers complain you can’t view the source of a WebAssembly program.
      -

      Thinks maybe this is an issue with WebAssembly.
      -

      Web graphics developers are more concerned with performance
      -

      So if there is a tradeoff, should probably choose performance.
      -

      CW: does anyone expect to be able to download a WebAssembly module
      and see the original sources?
      -

      KN: no, but they “prefer” JavaScript because they can see the sources.
      -

      DJ: developers of WebAssembly found they needed a human readable
      source even just for tests.
      -

      DJ: “more concerned with performance” – does that also mean
      downloading the shader?
      -

         KN: yes. Download, compile, etc. If there’s a tradeoff there then
         think we have to take the performant option.
         -

      DJ: what about if it’s much faster on one platform? for example one
      that ships an HLSL compiler?
      -

      CW: different topic than view-source.
      -

      DJ: talking about performance.
      -

      KN: think we basically mean, on all platforms.
      -

      CB: two of these options might have different characteristics. For
      example, first frame performance vs. higher framerate.
      -

      CW: not sure why one would have a higher frame rate.
      -

      CB: sometimes have a platform-specific compiler optimization.
      -

      DJ: we’ve found that sometimes we can optimize JavaScript better than
      the lower-level IR. Can do some optimizations because we understand more
      about the program.
      -

      CB: we hear this from driver vendors too.
      -

      CW: is it a goal for the NVIDIA driver to optimize Unreal Engine’s
      shaders on the web?
      -

   CW: let’s talk about optimization opportunities.
   -

      SPIR-V vs. HLSL: HLSL is higher-level, therefore more optimization
      opportunities in backends.
      -

      CW disagrees.
      -

   KN: my second point was going to be: SPIR-V is not that low level.
   -

      1. It’s more view-source-able than WebAssembly
      -

      2. As far as we know it’s not any less optimizable than a high-level
      language.
      -

      Would be beneficial for some people who are arguing against low-level
      to take another look at SPIR-V.
      -

      Also would be valuable to show decompiling from SPIR-V to GLSL/HLSL.
      It’s intended for semantics-preserving decompilation assuming the debug
      info is there.
      -

   KN’s two options in the email were IR vs. SL. Argument is that they’re
   pretty much both at the same “level” i.e., “high-level”.
   -

      Tool does this conversion pretty well. Even if no debug info.
      Preserves structure like loops, function calls, etc. Modulo inlining.
      -

   JG: that’s the sort of code you’ll ship anyway. We effectively don’t
   have debug info for JavaScript on the web today because everyone minimizes
   their code. All names, context etc. is stripped. So SPIR-V would be able to
   reconstruct at the same basic level as JavaScript view-source on the web
   today.
   -

      CW/KN: agree
      -

      Example tool is SPIRV-Cross, with outputs in GLSL, HLSL,
      (experimental) MSL
      -

         https://github.com/KhronosGroup/SPIRV-Cross
         -

         Example outputs are in its “reference” test set:
         https://github.com/KhronosGroup/SPIRV-Cross/tree/master/reference
         -

            E.g. GLSL input
            https://github.com/KhronosGroup/SPIRV-Cross/blob/master/shaders/vert/ocean.vert
            -

            Becomes GLSL output
            https://github.com/KhronosGroup/SPIRV-Cross/blob/master/reference/shaders/vert/ocean.vert
            -

   MM: saying that it doesn’t work in JavaScript doesn’t mean that
   view-source is useless.
   -

   JG/KN: agree.
   -

      But there’s no strong pressure to do better than we already have.
      -

      DJ: you could argue that if it’s what people are shipping and what
      people want then we should just ship that format.
      -

   CW: that’s fair, but that’s not necessarily what people want.
   WebAssembly is a tradeoff based on size/performance/etc. Argue that IR is
   better. Every GLSL compiler has bugs. Half of Corentin’s job has been to
   write compiler passes to work around GLSL compiler bugs. And this is a
   language that has a working group, conformance tests, and reference
   implementation.
   -

      SPIR-V is well defined, clear spec. Easy to make interoperable
      implementations from it. And that’s what the web is about:
interoperability.
      -

      DJ: it’s a great point. One of the points of accepting SPIR-V would
      be accepting the SPIR-V compiler itself.
      -

      JG: SPIR-V being a binary format and having a well-defined spec makes
      it a better ingestion target. The funnel of what we ingest will
be smaller.
      -

      CB: you’re going to take the SPIR-V source and compile it into all
      browsers?
      -

      JG: yes.
      -

      CB: so the interoperable behavior will be enforced by compiling the
      same SPIR-V compiler into browsers?
      -

      JG: would like to avoid having a monoculture.
      -

   MM: why will not SPIR-V have the same problems as high-level languages?
   -

   CW: because things are better defined in the binary format. Things like
   initializer expressions (or lack thereof) in HLSL. Scoping rules in if and
   switch statements.
   -

   JG: Some things are easier to parse than others. For example C has a
   monstrous grammar and is extremely is to make errors in.
   -

   DJ: it’s not our job just to make our lives easier. It’s our job to make
   web developers’ lives easier.
   -

   JG: agree. But it’ll be easier to follow through on our promises to
   developers if we choose an easier compilation target.
   -

   KN: agree. We’ll reach something sufficiently stable sooner, and maybe,
   ever. If the project’s simpler, it’s more likely that we’ll get it right.
   -

   CB: it seems that the way people are getting uniform behavior is that
   everyone uses ANGLE.
   -

   CW: ANGLE has tons of bugs too. Have been fuzzing it.
   -

   JG: not having a monoculture in WebGL – multiple implementations – has
   pushed everyone forward. Has materially pushed ANGLE forward too.
   -

   CB: but what if we’ve found and fixed all of the bugs in ANGLE?
   -

   CW: what about multiple versions of ANGLE in different browsers?
   -

   CB: not sure what expected release cadence is.
   -

   CW: based on the releases of the browsers.
   -

   JG: working on 6-week development cycle.
   -

   DJ: we should follow up on Kai’s email. Also a Github issue about
   accepting HLSL as a proposal.
   -

   CW: putting the cart before the horse.
   -

   JG: we’ll discuss on the mailing list.
   -

   DJ: would like to see what developers would prefer. This is as important
   to me as view-source.
   -

   CW: let’s discuss on the mailing list.

Agenda for next meeting

   -

   Been stuck on memory barriers for a while?
   -

   JG / MM: “anything else” than this.
   -

   MM: one interesting bit when Myles publishes Apple’s API today to
   WebKit: how to do buffer and texture updates. Has to be some way to
   schedule a buffer update (from CPU to GPU) so that it doesn’t get updated
   while it’s in use by the GPU.
   -

   CW: let’s talk about data upload/download. Something we talked about a
   while ago. Keep it light.
   -

   RC: think we should continue this discussion too in the future though.
   Had some things to day.
   -

   MM: agree, can’t ship anything until this is resolved.
   -

   CW: memory barriers and shading languages are two big topics. Can’t ship
   without resolution.

Received on Monday, 30 October 2017 19:47:50 UTC