Re: Initial thoughts porting a decently complex WebGL2 application to WebGPU

On Wed, Nov 6, 2019 at 5:22 PM Kai Ninomiya <kainino@google.com> wrote:

> Thank you for all this input!
>
> On Wed, Nov 6, 2019 at 9:50 AM Jasper St. Pierre <jstpierre@mecheye.net>
> wrote:
>
>> I'm in the early stages of porting an application I make on the side for
>> fun, https://noclip.website , from WebGL2 to WebGPU. I am also a
>> platform developer at a games studio, and have experience porting games to
>> different graphics architectures.
>>
>> Last night, after a large number of difficulties, I finally managed to
>> get a simple version rendering. No textures are used yet, which is
>> something I will get into later, so only one scene is supported. I
>> understand that this is very early on in development, but I figured my
>> feedback would be helpful in developing the API in the future.
>>
>> If you would like to try it yourself, the only scene that I have tested
>> on is this one:
>>
>> https://noclip.website/#dksiv/dks1
>>
>> To turn on WebGPU mode, run `window.localStorage.setItem('webgpu', 1)` in
>> devtools console and then refresh the page.
>>
>> # General overview
>>
>> A few years ago, I invested in building a WebGPU-style platform layer
>> which I run on top of WebGL2. I am happy to say that my investment here
>> made for a relatively easy port, though the WebGPU layer is taking several
>> shortcuts and is currently incomplete in functionality.
>>
>> If you are interested in the details, the high-level interface that all
>> rendering goes through:
>>
>> https://github.com/magcius/noclip.website/blob/master/src/gfx/platform/GfxPlatform.ts
>>
>> The implementations for WebGL2 and WebGPU:
>>
>> https://github.com/magcius/noclip.website/blob/master/src/gfx/platform/GfxPlatformWebGL2.ts
>>
>>
>> https://github.com/magcius/noclip.website/blob/master/src/gfx/platform/GfxPlatformWebGPU.ts
>>
>> I use overly simplified interfaces in some places due to the need for
>> portability, especially for things like buffer management and binding
>> models.
>>
>> # Shaders
>>
>> This is where I expected most of the trouble to be, and indeed, it was.
>> Most of my shaders were hand-written in GLSL, a choice made out of
>> necessity for WebGL more than a desire. GLSL is a very unfortunate language
>> to develop in, and the variations between profiles means it's honestly just
>> as annoying as a clean break to a new shading language. To list some points:
>>
>> * This is not a problem with GLSL per se, but I had to try multiple
>> different versions of glslang before I found a combination that worked.
>> This is partially due to the immaturity of JS build tooling and WebAssembly
>> together, but also found lots of buggy versions in the wild, including the
>> version in "@webgpu/glslang". I ended up vendoring my own glslang, based on
>> the version shipped in BabylonJS.
>>
> That one should be a copy of one of the published versions on npm, but I
> don't know which one.
>
Oh, it could also be due to the nocompute builds not supporting separate
samplers/textures (i.e. not supporting texturing at all). This is fixed
upstream, I just need to build and publish it.

... and... done. Please try out the new 0.0.9 release.


> This still has some minor issues (compileGLSLZeroCopy returns an object
>> whose "free" method has been renamed to "fa", probably as a result of
>> minifying the code), but was workable for my purposes.
>>
> Oops. I'll fix this.
>
Fixed in 0.0.9.

* In WebGL 2 / GLES2, binding points are decided by the shader compiler,
>> requiring synchronous compilation upon calling getUniformLocation. Explicit
>> binding points are not allowed. In contrast, WebGPU requires explicit
>> binding points, something I believe is a positive, but makes building a
>> common binding model between the two difficult. I was able to adapt my
>> simplified binding model to GLSL using some ugly regular expression
>> preprocessing.
>>
>> * The GL story around sampler binding has always been... bizarre to say
>> the least, requiring a strange set of indirections to map sampler uniforms
>> to fixed-function sampler indices. To simplify this, I used the convention
>> of specifying an array of sampler2D objects, e.g. `uniform sampler2D
>> u_Texture[4]` in most of my shaders, and some basic reflection pulls this
>> out and calls `gl.uniform1fv(gl.getUniformLocation("u_Texture"), [0, 1, 2,
>> 3])` right after compilation [0], which makes sampler management a tad
>> easier.
>>
>> This clashes very badly with the WebGPU implementations today. In SPIR-V,
>> an array of sampler resources like this is its own type, and requires
>> special handling in Vulkan. I am unsure how SPIRV-Cross maps this to HLSL.
>> Dawn, currently, hardcodes the descriptor count to "1" in the Vulkan
>> backend [1], which means that there is no way for a WebGPU client to upload
>> an array of sampler or texture resources.
>>
>> I don't know what a good solution to this looks like. I am probably going
>> to have to do some heavy preprocessing to change how textures and samplers
>> are accessed.
>>
>> * WebGL 2 requires combined texture/sampler points. WebGPU does not have
>> combined texture/sampler and requires two different resources. In GLSL, to
>> use separate sampler/texture objects, they must be combined into a
>> "combined texture/sampler" object at point of use, e.g.
>> `texture(sampler2D(u_Texture[0], u_Sampler[0]), v_TexCoord)`. GLSL
>> *specifically* requires that sampler2D is constructed at-point-of-use. I
>> was not aware of this restriction at the time, and have written code that
>> takes sampler2D as an argument [2]. This is illegal in this flavor of GLSL,
>> and I will have to do more work to come up with a model that works in both
>> flavors of GLSL.
>>
>> "GLSL" is not one language but instead a vague idea of one, with
>> arbitrary restrictions necessitated by simple compiler implementations,
>> without any concern for developer ergonomics. This is something
>> decently-well-understood in the porting community, but not as much by the
>> graphics systems community at large. I cannot emphasize how difficult it is
>> to work with multiple profiles of GLSL without going insane. For now, you
>> can see most of my hacks here as regular expression preprocessing on the
>> text [3]. It is ugly and brittle. I do not like it. A proper shader
>> compilation pipeline is desperately needed.
>>
> Do you mean that you'd like to have one shading language that you can
> compile for both WebGPU and WebGL?
>
>
>> # Other notes
>>
>> These are minor things that bit me during the port.
>>
>> * An error message along the lines of "Pipeline attachments do not
>> match". This turned out to be that I was missing sampleCount in my pipeline
>> state. I did not realize it was actually a parameter in the pipeline state.
>> This should probably be marked required, as otherwise it is easy to not
>> realize it is there.
>>
> If the error message was good (e.g. 'pipeline attachment sampleCount
> doesn't match pipeline layout') do you think this would still be necessary?
> Sorry our error messages for these giant objects are really bad right now.
>
> * "Row pitch is not a multiple of 256". There is currently no wording in
>> the spec about limits of rowPitch, but a row-pitch of 256 seems like a
>> large number to me, especially when we get down into the mipmaps below that
>> size. 128, 64, 32, 16, 8, 4, 2, 1 will all require padding which is
>> expensive.
>>
> This is a D3D12 restriction: D3D12_TEXTURE_DATA_PITCH_ALIGNMENT
> We've talked about emulating it, but would that be the right approach?
> I don't know how D3D12 apps deal with this problem.
>
> * The biggest thing that caused me pain during development was that I had
>> a vertex attribute with a byteStride of 0. This is supported in WebGL 2 and
>> it just means to use the packed size of the component, but this does not
>> work in WebGPU. There was no validation around this. I don't think it
>> should be legal to have a byteStride of 0, so it might make sense to add a
>> validation error for this.
>>
> Hm, I'm not sure whether we intended for this to work, actually. I would
> guess not, which means updating this test
> <https://gpuweb.github.io/cts/?q=cts:validation/vertex_input:a_stride_of_0_is_valid=>
> .
>
> * I currently use { alpha: false } for my compositing in WebGL 2. I
>> probably missed it, but I didn't see an equivalent in WebGPU. A BGRX
>> swapchain format would be nice. This also caused me some confusion during
>> development, as it looked like my clear color did not apply. Plenty of
>> games that I emulate have no strong regard for the contents of the alpha
>> channel of the final buffer, and being able to composite opaquely is in
>> many ways a correctness thing.
>>
> There will definitely be an equivalent, but I don't think we've figured
> out how it looks.
> I suppose it ought to be part of the canvas context creation, not the swap
> chain, so that reconfiguring the swapchain can't change whether the canvas
> has transparency.
>
> * Lack of synchronization around buffer access. I understand the design
>> here for uploads and downloads is still ongoing, but given my I expected to
>> see more explicit synchronization, including user-space ring buffering. I
>> am hopeful for the forward progress on these proposals.
>>
>> * Trying to run WebGPU under RenderDoc on Windows showed that Chrome was
>> using both D3D11 and D3D12, with D3D12 marked as "not presenting". I don't
>> expect too much yet, but debuggability is a strong point from me.
>>
> This is probably because the D3D12 context is just producing textures for
> D3D11 to present. Does RenderDoc need the D3D12 context to be presenting in
> order to debug it at all? Can we artificially insert a frame boundary that
> RenderDoc can detect?
>
> * I did not adjust my projection matrix for WebGPU and yet I seem to be
>> getting OK results. Perhaps I'm missing something, but I thought that the
>> output clip space should be 0...1, rather than the -1...1 generated by
>> gl-matrix. Getting RenderDoc up and running would help me adjust my
>> bearings.
>>
> I believe the output clip space z should be 0..1. Depending on how small
> your near-z is, it might not look significantly different. Or maybe we just
> have a bug.
>
> * The result is still a bit slow and crash-y. I expect this is simply as
>> the backend is not ready yet, but it is possible I am doing something
>> wrong. Do let me know.
>>
> I would doubt you're doing something wrong. The performance in Chrome is
> expected to be better on Metal because there are a lot of totally
> unoptimized parts of the Vulkan and D3D12 backends.
> Unfortunately, I can't try it on Metal because we're passing something
> invalid down to the driver:
> > stencilAttachmentPixelFormat MTLPixelFormatDepth32Float is not stencil
> renderable.
> so we'll have to look into that.
>
> There is also an assert in my local build of Chromium because we don't
> handle canvas resizes correctly.
>
>
>> # Final thoughts
>>
>> This was a very simple API for me to port to, but I also have a lot of
>> experience with more modern graphics APIs, and also some ability to "see
>> the future" when I built my graphics portability layer. Porting from raw
>> WebGL 2 to my Gfx layer took months of continued effort, and I can imagine
>> other libraries not equipped for the transition having a harder time.
>>
>> Still, having been involved in D3D12 -> Vulkan ports, getting a scene up
>> and running in a few nights is fantastic turnaround. Huge round of applause
>> to the whole team for the design of the API so far. Looking forward to
>> what's next.
>>
> I'm very glad to hear it. Thank you for all of your contributions to the
> design as well!
>
>
>> [0]
>> https://github.com/magcius/noclip.website/blob/213466e4f7c975b7bb6cee9ecd0b5fdcc3f04ed9/src/gfx/platform/GfxPlatformWebGL2.ts#L1589-L1598
>> [1]
>> https://dawn.googlesource.com/dawn/+/refs/heads/master/src/dawn_native/vulkan/BindGroupVk.cpp#48
>> [2]
>> https://github.com/magcius/noclip.website/blob/213466e4f7c975b7bb6cee9ecd0b5fdcc3f04ed9/src/BanjoKazooie/render.ts#L102-L121
>> [3]
>> https://github.com/magcius/noclip.website/blob/213466e4f7c975b7bb6cee9ecd0b5fdcc3f04ed9/src/Program.ts#L68-L90
>>
>> --
>>   Jasper
>>
>>

Received on Thursday, 7 November 2019 03:03:37 UTC