- From: Corentin Wallez <cwallez@google.com>
- Date: Mon, 6 Nov 2017 17:22:08 -0500
- To: public-gpu <public-gpu@w3.org>
- Message-ID: <CAGdfWNOqefgn3YwbotsHZDOzZp_=kQ4s6NOgeafacWd=2Dr1kg@mail.gmail.com>
GPU Web 2017-11-01
Chair: Dean
Scribe: Dean, Ken
Location: Google Hangout
Minutes from last meeting
<https://docs.google.com/document/d/1HToUA9xTVU8LASHe4PIMmg1n7txvFgX42w_Qt-7iBwI/>
TL;DR
-
Meeting with WebAssembly at TPAC.
-
Guests attending in person will have to pay a discounted fee, email
w3t-tpregister@w3.org. There should be a WebEx to attend remotely.
-
Will make a short presentation on status of WebGPU then will start
discussions on several subjects for the interaction of WASM and WebGPU.
-
On Wednesday will present a demo at the TPAC demo session.
-
Status updates:
-
Apple working on SL, and open-sourced a prototype structure for WebGPU
<https://trac.webkit.org/browser/webkit/trunk/Tools/WebGPUAPIStructure>
-
Google: working on TPAC demo, have a GLSL -> SPIR-V compiler in WASM.
-
Resource updates
-
The API should prevent data races between CPU and GPU.
-
Suggestion to have a “host access pass” similar to compute / render
passes.
-
Concern that the approach can’t reach the minimal number of copies.
-
Concern that this doesn’t map to D3D12 / Vulkan concepts (mostly
resolved)
-
Problem is similar to WebGL’s getBufferSubDataAsync.
-
Discussion whether different memory heaps should be exposed to the
application.
-
Suggestion to make all buffers writable by the CPU and hide
staging buffers. Would eliminate difference between discrete
GPU and UMA.
-
Concern that this would prevent apps from having optimal
performance.
Tentative agenda
-
Administrative stuff (if any)
-
Individual design and prototype status
-
Resource updates (data upload / download)
-
Agenda for next meeting
Attendance
-
Apple
-
Dean Jackson
-
Myles C. Maxfield
-
Theresa O'Connor
-
Google
-
Corentin Wallez
-
David Neto
-
John Kessenich
-
Kai Ninomiya
-
Ken Russell
-
Microsoft
-
Chas Boyd
-
Rafael Cintron
-
Mozilla
-
Dzmitry Malyshau
-
Jeff Gilbert
-
Yandex
-
Kirill Dmitrenko
-
Joshua Groves
Administrative items
-
TPAC (next week)
-
Meeting with WebASM - Tuesday 7th of November from 2PM to 3:30PM
-
Vague topic: what we want out of WebASM, what they want as an API
they can import into their environment
-
Requires registration with W3C, but can get discount for single-day.
-
Registration form was closed same day the email was sent.
-
Email address: w3t-tpregister@w3.org
-
They know about the joint meeting and know about the special rate.
-
Apple, Google, Mozilla at least in attendance
-
Hopefully will have a dial-in, will send out info ASAP, assume Webex.
-
Corentin will be giving a demo on the Wednesday, in an
exhibition-style session, basically showing the content he gave to the
Khronos meeting in January
-
Apple is happy with Google’s demo being the one shown (i.e. Apple
doesn’t need a spot)
-
Mozilla’s Patrick Walton might be able to show their demo.
-
No feedback on suitability of the license, from either Apple or
Mozilla’s legal teams.
-
Microsoft heard it would be ready “soon”.
Web Assembly Meeting
-
Corentin has emailed a list of topics:
-
Fast bindings for Web platform APIs (call overhead and GC-less render
loop)
-
How to make the API interoperable between JS and WASM
-
Is the WASM host mapping buffers in the WASM module memory space a
thing? (glMapBuffer equivalent)
-
Multithreading story for APIs
-
API extensibility in WASM (can't just add a dictionary entry like JS)
-
Any questions you might have!
-
DM: Bitflags and use of enums
Individual design and prototype status
-
Apple
-
MM: working on shading language implementation. Hoping to have
finished SPIR-V codegen, not quite done. Have rewriter piece
(take logical
mode program and rewrite to not have pointers or array references). Good
progress overall.
-
MM: also checked in demo API to WebKit, and impl on top of Vulkan!
https://trac.webkit.org/browser/webkit/trunk/Tools/WebGPUAPIStructure
-
DJ: apologize for posting one link to Github which turned into a
beast of an Issue. Going to split it into 4 issues regarding shading
languages.
-
CW: can we have a document from Apple about the security features,
etc. of Secure HLSL?
-
DJ: yes, Apple working on a position paper
-
Google
-
CW: progress on TPAC demo, using WebAssembly shader compiler in the
webpage demo, so no binary numbers in JavaScript
-
DJ: so this is a native app cross-compiling to WASM?
-
KN: yes. Compiler and library for compiling GLSL to SPIR-V.
-
DJ: cool, how big is exe?
-
KN: about 500K for a functional binary.
-
CW: talked with David Neto. Think we can make it smaller. GLSLang
is large for no reason.
-
DJ: shaderc is the name of Khronos ref compiler?
-
CW: no. It’s a wrapper around GLSLang. Helpers for sharp input,
more.
-
KN: only exposed 4 entry points. Maybe exposing more will make it
bigger. Aggressive dead code elimination.
-
DJ: think this is really good progress.
-
KN: shaderc builds without intervention!
-
It’s checked into Kai’s private github shaderc fork right now.
Planning to make it less hacky before upstreaming. [Kai to
provide link]
-
Microsoft
-
RC: responding to Dean’s mega-issue. No progress on code.
-
CB: looking at designs for how to implement some of Apple’s proposals
in HLSL (in the DXIL toolchain)
-
Mozilla
-
DM: lots of internal work. Pipeline states, synchronization
guarantees. Nothing public-facing.
-
Others?
Resource updates
-
DJ: how to get a buffer from the app into the GPU. Talked with MM a bit
earlier.
-
CW: sent on mailing list earlier today design of mapping buffers, etc.
in NXT. Kai can cover it.
-
MM: unfort. didn’t see CW’s email before now.
-
CW - offline - is ok, Kai can cover it
-
MM: how to schedule updates so they’re not modified by the CPU while the
GPU’s using them
-
One solution: any buffer updates have the same pass type structure as
a render or compute pass
-
2 properties:
-
1. The host program isn’t able to read/write to the [?/] point they
choose. They go in at a particular point in the pipeline. Only can happen
when GPU’s not using resource.
-
2. Passes already have sync points before and after them. These are
natural points where CPU/GPU sync occurs. The sync’s already
handled at an
API point of view. (Explicit vs. not is not being discussed right now.)
-
RC: how many copies of the ArrayBuffer will be made when they call the
update function?
-
MM: envisioning two things, uploads and downloads.
-
Uploads: make initial copy or handoff.
-
KR: You can transfer ArrayBuffers, or you could use
SharedArrayBuffers.
-
MM: API has access to resource. Map, copy, unmap.
-
Downloads: caller would provide handle to resource, offset, range.
Return Promise, and Promise would give access to ArrayBuffer.
-
JG: we are coincidentally discussing this exact topic in the WebGL WG.
Maybe some cross-pollination is useful. Not resolved in that WG yet, but
have some ideas.
-
With downloads from GPU: want to enqueue in the command stream a copy
into system memory. Need some way to wait on this or poll it to see when
it’s done.
-
Minimum number of copies. Downloading into shmem.
-
Couple of possibilities for sync. Using FenceSync, or using Query to
which the readback is attached.
-
MM: no strong opinion of whether it should be callback or polling based.
Key issue is that at point where you tell the API you want a download to
occur, that request is associated with a particular point in the queue.
Operations before that in the queue will be visible; operations after it
will not be.
-
JG: harder in Vulkan concepts. Easy in OpenGL-style concepts where
command stream is more serial.
-
In Vulkan, there’s a looser guarantee of when things happen.
-
MM: maybe earlier statement was too strong. At point you enqueue
upload/download request, it has the same sync properties we’ve been
discussing. If it’s encapsulated into a pass, then it all works together.
-
JG: thinking of staging buffers from D3D9. Always using copy
mechanisms, even from GPU-GPU. Resource type “CPU Resource”, and enqueue
copy from GPU to CPU resource. End up with local memory instead
of GPU mem.
-
MM: probably something we can agree to. Whether source is ArrayBuffer
or some API object that was created from one, doesn’t really matter to
Apple.
-
RC: think this is similar to what Corentin proposed. He didn’t call
it “staging” but it’s a buffer you fill, and there are 3 states. Once in
command stream, can’t touch it. There was an alternate one supporting
different ranges in the buffer.
-
MM: sounds like we’re fairly in agreement, some details to be worked out.
-
MM: next topic: in Vulkan (and D3D?), concept of region of memory
(heaps?). A resource created in one region might be CPU-visible but slow.
Maybe another region not CPU visible, but fast.
-
Texture might be in tiled form or linear. Maybe in tiled form can’t
read/write it from the CPU, but some CPUs might not be able to
sample from
linear texture.
-
Common in both APIs for a data upload/download to be done in 2
phases. First upload into a resource that’s CPU accessible /
slow / linear.
Phase 2 is to issue a GPU command to copy from slow resource to fast
resource.
-
DM: in Vulkan it’s device memory. Some implementations may introduce
memory types that are fast and visible to the CPU.
-
MM: question is: should these sort of staging resources be explicitly
described by the API?
-
Thinking is: if all uploads and downloads will take this “hop”
approach, and it’s possible for the web author to do this wrong, and
possible for an impl to [??] these buffer locations, then API should not
have this distinction of buffer type
-
Instead, the API should have a buffer which has 2 or 3 buffers behind
it.
-
Might be a couple of buffer copies behind the scenes.
-
RC: so you don’t want to have staging resources? Or handle it all for
you?
-
MM: the latter. The web developer doesn’t say anything about staging
buffers. All buffers readable/writable from CPU, and sample-able from pixel
shader. But might not be true. API responsible for marshalling data between
platform textures.
-
KN: so talking about hiding differences between unified memory
architectures and discrete memory architectures?
-
MM: yes. on UMA there is a texture type that supports fast CPU access.
But on discrete GPU the marshalling will happen. Upside is that web
author’s code doesn’t care about the distinction. Code will execute
optimally.
-
DM: are you sure it’s optimal? If you’re reading something back to the
CPU every frame, and know you have UMA, you can schedule the readback
earlier. If we do it automatically, can’t do it as efficiently.
-
Ex: queue up something and read it back to CPU each frame. Fill on
GPU. Enqueue transfer from GPU to CPU local memory. Later, map the buffer
and read it. MM suggesting to combine the last two phases, so there’s no
opportunity to schedule these things more optimally.
-
MM: if this happens every frame the browser can figure it out. Or,
buffer structure can take a hint parameter about its usage. (“Try to keep
close to the CPU”)
-
KN: sounds good for uploads, but don’t see it working well for downloads.
-
JG: have some concerns about this.
-
DJ: next step: Myles writes an issue for this.
Agenda for next meeting
-
Let’s skip next week because of TPAC
-
Let’s continue buffer upload/mapping discussion in the meeting after
that. Maybe discuss shading languages after that.
Received on Monday, 6 November 2017 22:22:57 UTC