Re: Some Feature requests. from Dzmitry Malyshau on 2019-08-06 (public-gpu@w3.org from August 2019)

From: Dzmitry Malyshau <dmalyshau@mozilla.com>
Date: Tue, 6 Aug 2019 13:13:06 -0400
To: Kevin Rogovin <kevinrogovin@invisionapp.com>
Cc: public-gpu@w3.org
Message-ID: <b751ecf2-b6c6-5451-a162-b9b9bffa533b@mozilla.com>
Hi Kevin,

Thanks for correcting me on the clip distance! Apparently, it's not as 
much that clip-distance is outdated, as it's my knowledge about it:)

The supporting links you have provided are good material for a future 
investigation issue.

Regards,

Dzmitry


On 8/6/19 12:48 PM, Kevin Rogovin wrote:
> Hi,
>
>  Thank you for the fast response. I will file issues separately, but I 
> will share my thoughts on the reply.
>
> Firstly, Vulkan does support HW-clip planes, 
> indeed VkPhysicalDeviceFeatures has fields for both clipping and 
> culling (shaderClipDistance and shaderCullDistance) along with how 
> many from the fields maxClipDistances and maxCulldistances 
> from VkPhysicalDeviceLimits. In addition, Metal also support 
> clip-distance in its shading language, see 5.2.3.3 Vertex Function 
> Output Attributes of the Metal 2.2 shading spec, 
> https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf. 
> D3D12 also support clip-distance, see the enumeration 
> D3D12_CLIP_OR_CULL_DISTANCE_COUNT at 
> https://docs.microsoft.com/en-us/windows/win32/direct3d12/constants . 
> Also, saying the clip-planes is a thing of the past is quite 
> unrealistic as there are a significant number of rendering algorithms 
> that I wish to employ that uses them. I do however advocate for it to 
> be ok to report that there are no such clip-distance values supported 
> since it is acceptable in Vulkan to not support clip-distance. Many 
> GPU's dedicate a non-trivial amount of silicon to implement these 
> user-defined clip-planes and to not make them available when present 
> seems far from ideal. Emulating HW-clip planes through compute is 
> quite icky though (and typically involving atomic-ops in the compute 
> shader) and emulating it through discard is the worst possible.
>
> Secondly, on the subject of advanced blend equations, I would rather 
> that the feature was part of the spec from day 0 with the ability to 
> query if it was supported. For UI rendering these blend modes prevent 
> a large amount of terrible poorly performant options. These blend 
> equations are already available as extensions in Vulkan, see 
> https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VK_EXT_blend_operation_advanced. 
> Leaving it for a later-extension is essentially pushing it down 
> further away with which comes a higher chance it never sees the real 
> light of day. The only part that is affected really is just adding 
> those blend modes to the current list along with a query to report if 
> it is supported.
>
> Next, issue (6) and (4) are VERY different. Issue (6) gives an 
> application the ability to read the current value from the framebuffer 
> at the fragment location, not all hardware support this, but most 
> tiled-based architectures can or do (indeed a number of them do not 
> have dedicated blending units and perform blending by adding an 
> epilogue to do the blending). This feature can be done, with effort, 
> for Metal on iOS (but not MacOS) but I have not seen an extension for 
> Vulkan yet. In contrast, (4) is about declaring a value needs to never 
> be sent back to memory (mostly an optimization for tiled-based 
> renderers) but one cannot read back any values in the shader, instead 
> it is for after a "sort-of-render-target-change". Feature (4) is a 
> just an optimization for tiled-based renderers.
>
> Item (7), fragment shader interlock is available, on some hardware, on 
> Vulkan with: VK_EXT_fragment_shader_interlock
>
>
> Lastly, a good GPU application needs to have some understanding of the 
> GPU:
>  - is it tiled based renderer or not?
>  - what optional features are possible?
>
> For example, on a non-tiled based renderer, reading from the current 
> render target is nowhere near as heavy operation as it is for a tiled 
> based renderer. I advocate that exposing these elements will allow 
> applications to get more performance from the GPU which is much the 
> reason for WebGPU. I am all for making code portable, but GPU 
> performance intensive applications (the purpose of WebGPU) needs this 
> to get that, otherwise the gap between native and web will be quite large.
>
> At any rate, I will file each of these as separate issues, but I would 
> like to have a discussion on these on the mailing list (or in the 
> issues) out in the open. Ideally, we would here input not just from 
> the implementors point of view, but also the developers point of view.
>
> Best Regards,
>  -Kevin Rogovin
>
>
>
>
> On Tue, Aug 6, 2019 at 6:39 PM Dzmitry Malyshau <dmalyshau@mozilla.com 
> <mailto:dmalyshau@mozilla.com>> wrote:
>
>     Hi Kevin,
>
>     Thank you for writing down your (employer's) use cases!
>
>     Ideally, these would need to be filed as issues on
>     https://github.com/gpuweb/gpuweb/issues .
>
>     1. Needs an investigation to be done (see others  -
>     https://github.com/gpuweb/gpuweb/labels/investigation). Roughly
>     speaking, this is very useful and IIRC widely supported, we should
>     have
>     it in the API.
>
>     2. User clip planes are the thing of the past, found in none of our
>     target APIs (Vulkan, D3D12, Metal). Therefore, I don't think this
>     feature should influence WebGPU spec.
>
>     3. This appears to only be supported in Vulkan (of the 3 APIs we
>     target)
>     and provides only a minor benefit (unless you have numbers to show
>     otherwise?). Perhaps, this would work as a small extension, but it
>     doesn't seem necessary for MVP or V1 of the API.
>
>     4 and 6. These are similar (in a sense that both are addressed by
>     Vulkan
>     sub-passes). Finding a good model of the API that would be
>     portable is
>     difficult. There needs to be a solid investigation followed by one or
>     more proposals before we can have this.
>
>     5. I don't think any of our target APIs support this, so this feature
>     can't be influencing the WebGPU spec.
>
>     7. Haven't looked into it. Needs an investigation done.
>
>
>     You are welcome to file issues and help us with
>     investigations/proposals!
>
>     Note that in general we are trying to not have a lot of variation
>     in the
>     exposed device "geometry". These extra flags and capabilities make
>     the
>     application take different code paths on different platforms, which
>     hurts the portability property of the API and makes fingerprinting
>     easier.
>
>     Thank you,
>
>     Dzmitry
>
>
>     On 8/6/19 3:55 AM, Kevin Rogovin wrote:
>     > Hi,
>     >
>     >  I have a number of feature requests which are quite important
>     for my
>     > employer's use cases.
>     >
>     > First the easiest ones:
>     >
>     > 1. Dual source blending, i.e. add the blend modes: "src1-color",
>     > "one-minus-src1-color", "src1-alpha", "one-minus-src1-alpha",
>     > "src1-alpha-saturated". Each of these has a direct analogue in
>     Vulkan,
>     > Metal and Direct3D12.
>     >
>     > 2. Add Hw-clip-planes where a query states how many hardware
>     > clip-planes are supported. It is OK if the return value is 0. In
>     > particular, if the GPU does not support HW-clip planes from its
>     API,
>     > it should return 0. I have quite a few cases where knowing if
>     HW-clip
>     > planes are available can change my rendering strategy and
>     improve GPU
>     > efficiency significantly. Lastly, using discard to emulate HW-clip
>     > planes can have large, negative performance impact and is
>     something I
>     > (and others) should avoid.
>     >
>     > 3. Derived pipeline state objects. Not all of the targeted API's
>     have
>     > this feature, but those that do, like Vulkan, it can help. The main
>     > use case is again that if two PSO's are quite similar, then a
>     driver
>     > can upload only the parts are different and compute in advance what
>     > those parts that are different.
>     >
>     > Now the tricky ones which require significant thought to
>     properly do:
>     >
>     > 4. Render passes with local storage. This was something that was
>     > non-trivial in Vulkan I admit but the potential usefulness is
>     > significant. The basic idea is the ability to declare a value in
>     the
>     > frag-shader as intermediate to be read from the exact same pixel
>     > location in a later rendering pass. The big use case is for tile
>     based
>     > renderers so that temporary data is never sent out to memory. This
>     > gives a large performance and power-saving boost for deferred
>     > rendering strategies.
>     >
>     > And lastly, features that not all GPU's can do, but are game
>     changers:
>     >
>     > 5. To *optionally* support the blend modes of khr-blend-equations
>     > advanced. I just want the API to have a query to ask if it is there
>     > and as extensions rollout for Vulkan or ability to emulate with
>     Metal
>     > as found in iOS, to use this feature if the GPU supports it. On the
>     > desktop two of the three major GPU providers have hardware
>     support for
>     > this feature. Of the mobile GPU's I think most have this in
>     their GLES
>     > implementations.
>     >
>     > 6. For tile based renderers, the ability to read the "last"
>     value of
>     > the framebuffer at the fragment, something akin to
>     > GL_EXT_shader_framebuffer_fetch. Again, not to require this
>     feature,
>     > but the ability to query it. Most tiled based renderers can support
>     > this on some level and on the desktop, two of the three can
>     either do
>     > or emulate this feature. For a variety of situations, this can be a
>     > game changer to improve performance as well. On mobile, I know that
>     > atleast 3 of the GPU lines out there support or can support this
>     feature.
>     >
>     > 7. Another useful feature is an analogue of
>     > GL_ARB_fragment_shader_interlock; again two of the three desktop
>     GPU's
>     > have HW support for this feature. For a variety of situations, this
>     > can be a game changer to improve performance as well.
>     >
>     > I would like to participate in the discussions, not just drop the
>     > above wish list. I.e. I want to help make any, or all, of the above
>     > land in WebGPU.
>     >
>     > Best Regards,
>     >  -Kevin Rogovin
>
Received on Tuesday, 6 August 2019 17:13:33 UTC