Re: Some Feature requests. from Kevin Rogovin on 2019-08-06 (public-gpu@w3.org from August 2019)

From: Kevin Rogovin <kevinrogovin@invisionapp.com>
Date: Tue, 6 Aug 2019 23:21:55 +0300
To: Kai Ninomiya <kainino@google.com>
Cc: Dzmitry Malyshau <dmalyshau@mozilla.com>, public-gpu <public-gpu@w3.org>
Message-ID: <CALKNkvFVt42V63GKkcZC=37C-1mRoKBwUNigU3oqCXEWEjUvyQ@mail.gmail.com>
Hi,

For the case of the advanced blend equations, all we are talking is adding
the enumerations to the current list and a query to ask if they are
available. The initial implementations I would expect to just report NOT
supporting that blend mode, but by having it *there* it more likely to get
implemented and used. On the subject of extensions, one awful pain point
for me has been that the w3c extensions for WebGL1 and WebGL2 lagged
terribly, terribly behind the extensions available for GLES2 and GLES3. The
worst ones for GLES2 (and there is no WebGL1 extension) is support for
non-power-2 texture mipmapping inspite of a GLES2 extension being available
and widely supported across hardware. A fair number of good GLES3
extensions also never made their way to WebGL2 inspite of wide hardware
support. Given that extension history, I much rather have a query for
"easy-things-to-specify" rather than an extension.

By placing features that are not too difficult to add to the spec together
with a query mechanism these features are far more likely to be used. The
issue with extensions is often chicken-and-egg: if the extension is not
supported a work-around is made (typically at performance cost) and a
developer moves on leaving performance on the floor waiting for the
extension to be ratified and in turn adopted.

From my point of view, the advanced blend equations and clip-distance
features fall into the category of being easy to specify and easy to add a
query. The idea of framebuffer-fetch (which I freely admit is mostly
targeted at mobile) is also quite easy to specify and query as well (in my
opinion), but I can see some portions where people can be edgy on such a
feature. Fragment shader interlock is also not very complicated to state,
but I can see that if not spec'd nicely can leave it quite challenging to
make it possible to translate to native shaders supporting that feature.

I also advocate a query mechanism that would describe the GPU. That is far
easier than for an application to go through the work of testing the GPU at
app-startup. FWIW, I need only a few seconds (burning battery though) to
figure out if a GPU is tiled based or not, along with if its PowerVR, Mali
or Qualcomm (I am not familiar with Apple's GPU's differences to tell it
apart because I have not worked with it much). With longer testing, I can
also likely figure out if a GPU is NVIDIA, Intel or AMD along with
estimates of its FLOPS and bandwidth. Better to just add a query to let the
GPUWeb-app query some of this info instead of benching the GPU to do
pointless things. The query would be essentially akin to GL's GL_RENDERER
and GL_VENDOR strings (there are analogues in Vulkan as well) along with if
it is going through D3D12, Vulkan, Metal or something else. Knowing the GPU
can be important on getting the most out of compute (there is a
GPU-specificness on what are the ideal sizes of compute work groups and
such, indeed a number of benchmarks even do a little tuning before
launching to get a better compute shader dispatch pattern).

The beauty about WebGPU is that to get native performance out of the GPU
only requires the ability to send the GPU commands (since the GPU is doing
the work) and the ability to know the GPU for platform specific
optimizations (that are important often enough).

-Kevin

On Tue, Aug 6, 2019 at 9:16 PM Kai Ninomiya <kainino@google.com> wrote:

> > Leaving it for a later-extension is essentially pushing it down further
> away with which comes a higher chance it never sees the real light of day.
> The only part that is affected really is just adding those blend modes to
> the current list along with a query to report if it is supported.
>
> There is a trade-off here, though. The more we pack into the original
> spec, the longer it is going to take to finish and the more behind native
> APIs we will be by the time we ship. I am also confident that we will be
> able to get this enabled as an extension after we ship MVP/1.0, as long as
> we have an issue open about it and there are still customers who need it.
>
> On Tue, Aug 6, 2019 at 10:14 AM Dzmitry Malyshau <dmalyshau@mozilla.com>
> wrote:
>
>> Hi Kevin,
>>
>> Thanks for correcting me on the clip distance! Apparently, it's not as
>> much that clip-distance is outdated, as it's my knowledge about it:)
>>
>> The supporting links you have provided are good material for a future
>> investigation issue.
>>
>> Regards,
>>
>> Dzmitry
>>
>>
>> On 8/6/19 12:48 PM, Kevin Rogovin wrote:
>>
>> Hi,
>>
>>  Thank you for the fast response. I will file issues separately, but I
>> will share my thoughts on the reply.
>>
>> Firstly, Vulkan does support HW-clip planes,
>> indeed VkPhysicalDeviceFeatures has fields for both clipping and culling
>> (shaderClipDistance and shaderCullDistance) along with how many from the
>> fields maxClipDistances and maxCulldistances from VkPhysicalDeviceLimits.
>> In addition, Metal also support clip-distance in its shading language, see
>> 5.2.3.3 Vertex Function Output Attributes of the Metal 2.2 shading spec,
>> https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf.
>> D3D12 also support clip-distance, see the enumeration D3D12_CLIP_OR_CULL_DISTANCE_COUNT
>> at https://docs.microsoft.com/en-us/windows/win32/direct3d12/constants .
>> Also, saying the clip-planes is a thing of the past is quite unrealistic as
>> there are a significant number of rendering algorithms that I wish to
>> employ that uses them. I do however advocate for it to be ok to report that
>> there are no such clip-distance values supported since it is acceptable in
>> Vulkan to not support clip-distance. Many GPU's dedicate a non-trivial
>> amount of silicon to implement these user-defined clip-planes and to not
>> make them available when present seems far from ideal. Emulating HW-clip
>> planes through compute is quite icky though (and typically involving
>> atomic-ops in the compute shader) and emulating it through discard is the
>> worst possible.
>>
>> Secondly, on the subject of advanced blend equations, I would rather that
>> the feature was part of the spec from day 0 with the ability to query if it
>> was supported. For UI rendering these blend modes prevent a large amount of
>> terrible poorly performant options. These blend equations are already
>> available as extensions in Vulkan, see
>> https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VK_EXT_blend_operation_advanced.
>> Leaving it for a later-extension is essentially pushing it down further
>> away with which comes a higher chance it never sees the real light of day.
>> The only part that is affected really is just adding those blend modes to
>> the current list along with a query to report if it is supported.
>>
>> Next, issue (6) and (4) are VERY different. Issue (6) gives an
>> application the ability to read the current value from the framebuffer at
>> the fragment location, not all hardware support this, but most tiled-based
>> architectures can or do (indeed a number of them do not have dedicated
>> blending units and perform blending by adding an epilogue to do the
>> blending). This feature can be done, with effort, for Metal on iOS (but not
>> MacOS) but I have not seen an extension for Vulkan yet. In contrast, (4) is
>> about declaring a value needs to never be sent back to memory (mostly an
>> optimization for tiled-based renderers) but one cannot read back any values
>> in the shader, instead it is for after a "sort-of-render-target-change".
>> Feature (4) is a just an optimization for tiled-based renderers.
>>
>> Item (7), fragment shader interlock is available, on some hardware, on
>> Vulkan with: VK_EXT_fragment_shader_interlock
>>
>>
>> Lastly, a good GPU application needs to have some understanding of the
>> GPU:
>>  - is it tiled based renderer or not?
>>  - what optional features are possible?
>>
>> For example, on a non-tiled based renderer, reading from the current
>> render target is nowhere near as heavy operation as it is for a tiled based
>> renderer. I advocate that exposing these elements will allow applications
>> to get more performance from the GPU which is much the reason for WebGPU. I
>> am all for making code portable, but GPU performance intensive applications
>> (the purpose of WebGPU) needs this to get that, otherwise the gap between
>> native and web will be quite large.
>>
>> At any rate, I will file each of these as separate issues, but I would
>> like to have a discussion on these on the mailing list (or in the issues)
>> out in the open. Ideally, we would here input not just from the
>> implementors point of view, but also the developers point of view.
>>
>> Best Regards,
>>  -Kevin Rogovin
>>
>>
>>
>>
>> On Tue, Aug 6, 2019 at 6:39 PM Dzmitry Malyshau <dmalyshau@mozilla.com>
>> wrote:
>>
>>> Hi Kevin,
>>>
>>> Thank you for writing down your (employer's) use cases!
>>>
>>> Ideally, these would need to be filed as issues on
>>> https://github.com/gpuweb/gpuweb/issues .
>>>
>>> 1. Needs an investigation to be done (see others  -
>>> https://github.com/gpuweb/gpuweb/labels/investigation). Roughly
>>> speaking, this is very useful and IIRC widely supported, we should have
>>> it in the API.
>>>
>>> 2. User clip planes are the thing of the past, found in none of our
>>> target APIs (Vulkan, D3D12, Metal). Therefore, I don't think this
>>> feature should influence WebGPU spec.
>>>
>>> 3. This appears to only be supported in Vulkan (of the 3 APIs we target)
>>> and provides only a minor benefit (unless you have numbers to show
>>> otherwise?). Perhaps, this would work as a small extension, but it
>>> doesn't seem necessary for MVP or V1 of the API.
>>>
>>> 4 and 6. These are similar (in a sense that both are addressed by Vulkan
>>> sub-passes). Finding a good model of the API that would be portable is
>>> difficult. There needs to be a solid investigation followed by one or
>>> more proposals before we can have this.
>>>
>>> 5. I don't think any of our target APIs support this, so this feature
>>> can't be influencing the WebGPU spec.
>>>
>>> 7. Haven't looked into it. Needs an investigation done.
>>>
>>>
>>> You are welcome to file issues and help us with investigations/proposals!
>>>
>>> Note that in general we are trying to not have a lot of variation in the
>>> exposed device "geometry". These extra flags and capabilities make the
>>> application take different code paths on different platforms, which
>>> hurts the portability property of the API and makes fingerprinting
>>> easier.
>>>
>>> Thank you,
>>>
>>> Dzmitry
>>>
>>>
>>> On 8/6/19 3:55 AM, Kevin Rogovin wrote:
>>> > Hi,
>>> >
>>> >  I have a number of feature requests which are quite important for my
>>> > employer's use cases.
>>> >
>>> > First the easiest ones:
>>> >
>>> > 1. Dual source blending, i.e. add the blend modes: "src1-color",
>>> > "one-minus-src1-color", "src1-alpha", "one-minus-src1-alpha",
>>> > "src1-alpha-saturated". Each of these has a direct analogue in Vulkan,
>>> > Metal and Direct3D12.
>>> >
>>> > 2. Add Hw-clip-planes where a query states how many hardware
>>> > clip-planes are supported. It is OK if the return value is 0. In
>>> > particular, if the GPU does not support HW-clip planes from its API,
>>> > it should return 0. I have quite a few cases where knowing if HW-clip
>>> > planes are available can change my rendering strategy and improve GPU
>>> > efficiency significantly. Lastly, using discard to emulate HW-clip
>>> > planes can have large, negative performance impact and is something I
>>> > (and others) should avoid.
>>> >
>>> > 3. Derived pipeline state objects. Not all of the targeted API's have
>>> > this feature, but those that do, like Vulkan, it can help. The main
>>> > use case is again that if two PSO's are quite similar, then a driver
>>> > can upload only the parts are different and compute in advance what
>>> > those parts that are different.
>>> >
>>> > Now the tricky ones which require significant thought to properly do:
>>> >
>>> > 4. Render passes with local storage. This was something that was
>>> > non-trivial in Vulkan I admit but the potential usefulness is
>>> > significant. The basic idea is the ability to declare a value in the
>>> > frag-shader as intermediate to be read from the exact same pixel
>>> > location in a later rendering pass. The big use case is for tile based
>>> > renderers so that temporary data is never sent out to memory. This
>>> > gives a large performance and power-saving boost for deferred
>>> > rendering strategies.
>>> >
>>> > And lastly, features that not all GPU's can do, but are game changers:
>>> >
>>> > 5. To *optionally* support the blend modes of khr-blend-equations
>>> > advanced. I just want the API to have a query to ask if it is there
>>> > and as extensions rollout for Vulkan or ability to emulate with Metal
>>> > as found in iOS, to use this feature if the GPU supports it. On the
>>> > desktop two of the three major GPU providers have hardware support for
>>> > this feature. Of the mobile GPU's I think most have this in their GLES
>>> > implementations.
>>> >
>>> > 6. For tile based renderers, the ability to read the "last" value of
>>> > the framebuffer at the fragment, something akin to
>>> > GL_EXT_shader_framebuffer_fetch. Again, not to require this feature,
>>> > but the ability to query it. Most tiled based renderers can support
>>> > this on some level and on the desktop, two of the three can either do
>>> > or emulate this feature. For a variety of situations, this can be a
>>> > game changer to improve performance as well. On mobile, I know that
>>> > atleast 3 of the GPU lines out there support or can support this
>>> feature.
>>> >
>>> > 7. Another useful feature is an analogue of
>>> > GL_ARB_fragment_shader_interlock; again two of the three desktop GPU's
>>> > have HW support for this feature. For a variety of situations, this
>>> > can be a game changer to improve performance as well.
>>> >
>>> > I would like to participate in the discussions, not just drop the
>>> > above wish list. I.e. I want to help make any, or all, of the above
>>> > land in WebGPU.
>>> >
>>> > Best Regards,
>>> >  -Kevin Rogovin
>>>
>>>
Received on Tuesday, 6 August 2019 20:22:31 UTC