Re: Some Feature requests. from Kevin Rogovin on 2019-08-06 (public-gpu@w3.org from August 2019)

From: Kevin Rogovin <kevinrogovin@invisionapp.com>
Date: Tue, 6 Aug 2019 19:48:48 +0300
To: Dzmitry Malyshau <dmalyshau@mozilla.com>
Cc: public-gpu@w3.org
Message-ID: <CALKNkvFUZ6UZwk=a0CHO=v5=9yKFwUCSnpoHvNYOm800Td7oLQ@mail.gmail.com>
Hi,

 Thank you for the fast response. I will file issues separately, but I will
share my thoughts on the reply.

Firstly, Vulkan does support HW-clip planes,
indeed VkPhysicalDeviceFeatures has fields for both clipping and culling
(shaderClipDistance and shaderCullDistance) along with how many from the
fields maxClipDistances and maxCulldistances from VkPhysicalDeviceLimits.
In addition, Metal also support clip-distance in its shading language, see
5.2.3.3 Vertex Function Output Attributes of the Metal 2.2 shading spec,
https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf.
D3D12 also support clip-distance, see the enumeration
D3D12_CLIP_OR_CULL_DISTANCE_COUNT
at https://docs.microsoft.com/en-us/windows/win32/direct3d12/constants .
Also, saying the clip-planes is a thing of the past is quite unrealistic as
there are a significant number of rendering algorithms that I wish to
employ that uses them. I do however advocate for it to be ok to report that
there are no such clip-distance values supported since it is acceptable in
Vulkan to not support clip-distance. Many GPU's dedicate a non-trivial
amount of silicon to implement these user-defined clip-planes and to not
make them available when present seems far from ideal. Emulating HW-clip
planes through compute is quite icky though (and typically involving
atomic-ops in the compute shader) and emulating it through discard is the
worst possible.

Secondly, on the subject of advanced blend equations, I would rather that
the feature was part of the spec from day 0 with the ability to query if it
was supported. For UI rendering these blend modes prevent a large amount of
terrible poorly performant options. These blend equations are already
available as extensions in Vulkan, see
https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VK_EXT_blend_operation_advanced.
Leaving it for a later-extension is essentially pushing it down further
away with which comes a higher chance it never sees the real light of day.
The only part that is affected really is just adding those blend modes to
the current list along with a query to report if it is supported.

Next, issue (6) and (4) are VERY different. Issue (6) gives an application
the ability to read the current value from the framebuffer at the fragment
location, not all hardware support this, but most tiled-based architectures
can or do (indeed a number of them do not have dedicated blending units and
perform blending by adding an epilogue to do the blending). This feature
can be done, with effort, for Metal on iOS (but not MacOS) but I have not
seen an extension for Vulkan yet. In contrast, (4) is about declaring a
value needs to never be sent back to memory (mostly an optimization for
tiled-based renderers) but one cannot read back any values in the shader,
instead it is for after a "sort-of-render-target-change". Feature (4) is a
just an optimization for tiled-based renderers.

Item (7), fragment shader interlock is available, on some hardware, on
Vulkan with: VK_EXT_fragment_shader_interlock


Lastly, a good GPU application needs to have some understanding of the GPU:
 - is it tiled based renderer or not?
 - what optional features are possible?

For example, on a non-tiled based renderer, reading from the current render
target is nowhere near as heavy operation as it is for a tiled based
renderer. I advocate that exposing these elements will allow applications
to get more performance from the GPU which is much the reason for WebGPU. I
am all for making code portable, but GPU performance intensive applications
(the purpose of WebGPU) needs this to get that, otherwise the gap between
native and web will be quite large.

At any rate, I will file each of these as separate issues, but I would like
to have a discussion on these on the mailing list (or in the issues) out in
the open. Ideally, we would here input not just from the implementors point
of view, but also the developers point of view.

Best Regards,
 -Kevin Rogovin




On Tue, Aug 6, 2019 at 6:39 PM Dzmitry Malyshau <dmalyshau@mozilla.com>
wrote:

> Hi Kevin,
>
> Thank you for writing down your (employer's) use cases!
>
> Ideally, these would need to be filed as issues on
> https://github.com/gpuweb/gpuweb/issues .
>
> 1. Needs an investigation to be done (see others  -
> https://github.com/gpuweb/gpuweb/labels/investigation). Roughly
> speaking, this is very useful and IIRC widely supported, we should have
> it in the API.
>
> 2. User clip planes are the thing of the past, found in none of our
> target APIs (Vulkan, D3D12, Metal). Therefore, I don't think this
> feature should influence WebGPU spec.
>
> 3. This appears to only be supported in Vulkan (of the 3 APIs we target)
> and provides only a minor benefit (unless you have numbers to show
> otherwise?). Perhaps, this would work as a small extension, but it
> doesn't seem necessary for MVP or V1 of the API.
>
> 4 and 6. These are similar (in a sense that both are addressed by Vulkan
> sub-passes). Finding a good model of the API that would be portable is
> difficult. There needs to be a solid investigation followed by one or
> more proposals before we can have this.
>
> 5. I don't think any of our target APIs support this, so this feature
> can't be influencing the WebGPU spec.
>
> 7. Haven't looked into it. Needs an investigation done.
>
>
> You are welcome to file issues and help us with investigations/proposals!
>
> Note that in general we are trying to not have a lot of variation in the
> exposed device "geometry". These extra flags and capabilities make the
> application take different code paths on different platforms, which
> hurts the portability property of the API and makes fingerprinting easier.
>
> Thank you,
>
> Dzmitry
>
>
> On 8/6/19 3:55 AM, Kevin Rogovin wrote:
> > Hi,
> >
> >  I have a number of feature requests which are quite important for my
> > employer's use cases.
> >
> > First the easiest ones:
> >
> > 1. Dual source blending, i.e. add the blend modes: "src1-color",
> > "one-minus-src1-color", "src1-alpha", "one-minus-src1-alpha",
> > "src1-alpha-saturated". Each of these has a direct analogue in Vulkan,
> > Metal and Direct3D12.
> >
> > 2. Add Hw-clip-planes where a query states how many hardware
> > clip-planes are supported. It is OK if the return value is 0. In
> > particular, if the GPU does not support HW-clip planes from its API,
> > it should return 0. I have quite a few cases where knowing if HW-clip
> > planes are available can change my rendering strategy and improve GPU
> > efficiency significantly. Lastly, using discard to emulate HW-clip
> > planes can have large, negative performance impact and is something I
> > (and others) should avoid.
> >
> > 3. Derived pipeline state objects. Not all of the targeted API's have
> > this feature, but those that do, like Vulkan, it can help. The main
> > use case is again that if two PSO's are quite similar, then a driver
> > can upload only the parts are different and compute in advance what
> > those parts that are different.
> >
> > Now the tricky ones which require significant thought to properly do:
> >
> > 4. Render passes with local storage. This was something that was
> > non-trivial in Vulkan I admit but the potential usefulness is
> > significant. The basic idea is the ability to declare a value in the
> > frag-shader as intermediate to be read from the exact same pixel
> > location in a later rendering pass. The big use case is for tile based
> > renderers so that temporary data is never sent out to memory. This
> > gives a large performance and power-saving boost for deferred
> > rendering strategies.
> >
> > And lastly, features that not all GPU's can do, but are game changers:
> >
> > 5. To *optionally* support the blend modes of khr-blend-equations
> > advanced. I just want the API to have a query to ask if it is there
> > and as extensions rollout for Vulkan or ability to emulate with Metal
> > as found in iOS, to use this feature if the GPU supports it. On the
> > desktop two of the three major GPU providers have hardware support for
> > this feature. Of the mobile GPU's I think most have this in their GLES
> > implementations.
> >
> > 6. For tile based renderers, the ability to read the "last" value of
> > the framebuffer at the fragment, something akin to
> > GL_EXT_shader_framebuffer_fetch. Again, not to require this feature,
> > but the ability to query it. Most tiled based renderers can support
> > this on some level and on the desktop, two of the three can either do
> > or emulate this feature. For a variety of situations, this can be a
> > game changer to improve performance as well. On mobile, I know that
> > atleast 3 of the GPU lines out there support or can support this feature.
> >
> > 7. Another useful feature is an analogue of
> > GL_ARB_fragment_shader_interlock; again two of the three desktop GPU's
> > have HW support for this feature. For a variety of situations, this
> > can be a game changer to improve performance as well.
> >
> > I would like to participate in the discussions, not just drop the
> > above wish list. I.e. I want to help make any, or all, of the above
> > land in WebGPU.
> >
> > Best Regards,
> >  -Kevin Rogovin
>
>
Received on Tuesday, 6 August 2019 16:49:24 UTC