Re: Initial thoughts porting a decently complex WebGL2 application to WebGPU from Jasper St. Pierre on 2019-11-07 (public-gpu@w3.org from November 2019)

From: Jasper St. Pierre <jstpierre@mecheye.net>
Date: Thu, 7 Nov 2019 09:33:23 -0800
To: Corentin Wallez <cwallez@google.com>
Cc: public-gpu <public-gpu@w3.org>
Message-ID: <CAA0H+QRQed_AUAJz=AJ9PtRrYdZ6CscYQsBsiuiVQ_LXXM5tYA@mail.gmail.com>
On Thu, Nov 7, 2019 at 6:20 AM Corentin Wallez <cwallez@google.com> wrote:

> Thanks for all the input, it's extremely valuable!
>
> # Shaders
>
> Agreed with your interpretation that GLSL is a family of languages that
> can have important differences. Both Babylon.js and another projects have
> it the issue around the binding model and combined textures and samplers.
> They used preprocessor macros to annotate every single binding and produce
> different code for WebGL and WebGPU. Unfortunately WebGL's binding model
> doesn't match the consensus of explicit APIs there, so it's a complexity
> all developers porting to WebGPU will have to pay. A blog post explaining
> how someone dealt with this issue could definitely help in the long run :)
> (though I'd wait until the shading language discussions are resolved first).
>

This clashes very badly with the WebGPU implementations today. In SPIR-V,
>> an array of sampler resources like this is its own type, and requires
>> special handling in Vulkan. I am unsure how SPIRV-Cross maps this to HLSL.
>> Dawn, currently, hardcodes the descriptor count to "1" in the Vulkan
>> backend [1], which means that there is no way for a WebGPU client to upload
>> an array of sampler or texture resources.
>
>
> In WebGPU we decided some time ago to not have arrays of textures in the
> binding model to simplify things. I'd assume we'd want to treat arrays as
> consecutive bindings but it's not something that we discussed (and Dawn
> doesn't implement it).
>

Ergonomics-wise, it might make sense for GLSLang to have an option to
expand to multiple consecutive bindings, a la DXC's
-fspv-flatten-resource-arrays. This prevents true dynamic indexing, which
is fine for my case. But it might not be appropriate for all use cases,
e.g. dynamically indexed arrays of resources, which can be used to emulate
"bindless", if WebGPU ever wants to go down that road. Fun :)


> * WebGL 2 requires combined texture/sampler points. WebGPU does not have
>> combined texture/sampler and requires two different resources. In GLSL, to
>> use separate sampler/texture objects, they must be combined into a
>> "combined texture/sampler" object at point of use, e.g.
>> `texture(sampler2D(u_Texture[0], u_Sampler[0]), v_TexCoord)`. GLSL
>> *specifically* requires that sampler2D is constructed at-point-of-use. I
>> was not aware of this restriction at the time, and have written code that
>> takes sampler2D as an argument [2]. This is illegal in this flavor of GLSL,
>> and I will have to do more work to come up with a model that works in both
>> flavors of GLSL.
>
>
> Something that could help is putting a define that replaces the name of
> the texture with sampler2D(texture, sampler).
>

That won't work if I have GLSL code like:

    vec4 sampleTextureBlurred(in sampler2D tex, vec2 coord) {
        return textureOffset(tex, coord, vec2(-0.01, -0.01)) * (1.0/9.0) +
                  textureOffset(tex, coord, vec2(0.00, -0.01)) * (1.0*/9.0)
+
                  textureOffset(tex, coord, vec2(0.01, -0.01)) * (1.0*/9.0)
+
           ... you get the idea
    }

But yes, some amount of processing from a "shared GLSL" is required,
whether implemented through the GLSLang preprocessor or text munging by the
host.

My point here is that the compatibility story isn't as easy as "just use
GLSL"; the differences in binding models do make it difficult. If we intend
to target existing WebGL users (something I'm skeptical of given the very
big differences in basic API semantics of), we might want to investigate a
stronger compatibility story.

# Other notes
>
> * "Row pitch is not a multiple of 256". There is currently no wording in
>> the spec about limits of rowPitch, but a row-pitch of 256 seems like a
>> large number to me, especially when we get down into the mipmaps below that
>> size. 128, 64, 32, 16, 8, 4, 2, 1 will all require padding which is
>> expensive.
>
>
> This shapes to be another annoying part of the WebGPU API, but getting rid
> of it would require emulating these copies on D3D12 which we decided a
> while ago we shouldn't do.
>

This sort of comes down to D3D12 and C++ having sane and very fast ways of
manipulating memory, e.g. memcpy. The approach for D3D12 is to have a large
upload buffer used for all assets, and then you copy your resources into
it, and then schedule copies from the staging space into the texture. I
honestly did not know that D3D12_TEXTURE_DATA_PITCH_ALIGNMENT was 256,
since I've traditionally used GetResourceTiling which fills in the row
pitch for me.


> * The biggest thing that caused me pain during development was that I had
>> a vertex attribute with a byteStride of 0. This is supported in WebGL 2 and
>> it just means to use the packed size of the component, but this does not
>> work in WebGPU. There was no validation around this. I don't think it
>> should be legal to have a byteStride of 0, so it might make sense to add a
>> validation error for this.
>
>
> The intent was to allow this to have "constant" vertex attributes. But
> D3D12 doesn't support that so we should probably disallow byteStride = 0.
>

I currently use a per-instance vertex attributes for the "constant" vertex
attributes, which is a hack. If it's supported by Metal and Vulkan, we
might be able to pressure Microsoft into allowing it in future D3D12
versions.


> * I currently use { alpha: false } for my compositing in WebGL 2. I
>> probably missed it, but I didn't see an equivalent in WebGPU. A BGRX
>> swapchain format would be nice. This also caused me some confusion during
>> development, as it looked like my clear color did not apply. Plenty of
>> games that I emulate have no strong regard for the contents of the alpha
>> channel of the final buffer, and being able to composite opaquely is in
>> many ways a correctness thing.
>
>
> We discussed this at the Mountain view F2F and decided to defer having a
> BGRX format. Chromium tries to have all WebGPU content opaque at the
> moment, but it can fail when the canvas is promoted to an overlay (the
> Windows compositor doesn't allow opaque for example), see this TODO
> <https://cs.chromium.org/chromium/src/third_party/blink/renderer/platform/graphics/gpu/webgpu_swap_buffer_provider.cc?sq=package:chromium&g=0&l=46>
> .
>

I don't have the clearest example of Chrome's rendering, but looking at it
in RenderDoc showed the WebGPU surface being applied in three/four big
stripes with D3D11. Is this evidence of being rendered by the compositor,
even if it's not the Windows compositor?


> * Lack of synchronization around buffer access. I understand the design
>> here for uploads and downloads is still ongoing, but given my I expected to
>> see more explicit synchronization, including user-space ring buffering. I
>> am hopeful for the forward progress on these proposals.
>
>
> Could you expand on this? I don't think I understand what is the lack of
> synchronization.
>

If I am rendering frame 1 on the GPU and preparing frame 2 on the CPU, I
don't want to stomp on the buffer's contents. This can be done with a
staging buffer and copy on the queue (not great for performance), or
explicit ring-buffering and handoff (like the buffer partitioning
proposal). setSubData currently is quite undefined.


> # Final thoughts
>> This was a very simple API for me to port to, but I also have a lot of
>> experience with more modern graphics APIs, and also some ability to "see
>> the future" when I built my graphics portability layer. Porting from raw
>> WebGL 2 to my Gfx layer took months of continued effort, and I can imagine
>> other libraries not equipped for the transition having a harder time.
>> Still, having been involved in D3D12 -> Vulkan ports, getting a scene up
>> and running in a few nights is fantastic turnaround. Huge round of applause
>> to the whole team for the design of the API so far. Looking forward to
>> what's next.
>
>
> Thanks a lot, I'm glad this group is able to achieve an API with a good
> balance of explicitness and usability!
>
> @Kai
>
>>
>> Unfortunately, I can't try it on Metal because we're passing something
>> invalid down to the driver:
>> > stencilAttachmentPixelFormat MTLPixelFormatDepth32Float is not stencil
>> renderable.
>> so we'll have to look into that.
>
>
> You even have a TODO
> <https://cs.chromium.org/chromium/src/third_party/dawn/src/dawn_native/metal/RenderPipelineMTL.mm?sq=package:chromium&q=renderpipelineMTL&g=0&l=343>
> for that :)
>

I changed my application to use stencil8. Hopefully this helps you being
able to test it.
https://github.com/magcius/noclip.website/commit/179bcc40cd2fa49df6bbdd129139dabab1c0d79f


-- 
  Jasper
Received on Thursday, 7 November 2019 17:33:37 UTC