Re: Some Feature requests. from Kai Ninomiya on 2019-08-06 (public-gpu@w3.org from August 2019)

From: Kai Ninomiya <kainino@google.com>
Date: Tue, 6 Aug 2019 11:16:10 -0700
To: Dzmitry Malyshau <dmalyshau@mozilla.com>
Cc: Kevin Rogovin <kevinrogovin@invisionapp.com>, public-gpu <public-gpu@w3.org>
Message-ID: <CANxMeyCavC_jmj-muFwpoLsZeBOkcVgujREHb=s4BDk7z32L1w@mail.gmail.com>
> Leaving it for a later-extension is essentially pushing it down further
away with which comes a higher chance it never sees the real light of day.
The only part that is affected really is just adding those blend modes to
the current list along with a query to report if it is supported.

There is a trade-off here, though. The more we pack into the original spec,
the longer it is going to take to finish and the more behind native APIs we
will be by the time we ship. I am also confident that we will be able to
get this enabled as an extension after we ship MVP/1.0, as long as we have
an issue open about it and there are still customers who need it.

On Tue, Aug 6, 2019 at 10:14 AM Dzmitry Malyshau <dmalyshau@mozilla.com>
wrote:

> Hi Kevin,
>
> Thanks for correcting me on the clip distance! Apparently, it's not as
> much that clip-distance is outdated, as it's my knowledge about it:)
>
> The supporting links you have provided are good material for a future
> investigation issue.
>
> Regards,
>
> Dzmitry
>
>
> On 8/6/19 12:48 PM, Kevin Rogovin wrote:
>
> Hi,
>
>  Thank you for the fast response. I will file issues separately, but I
> will share my thoughts on the reply.
>
> Firstly, Vulkan does support HW-clip planes,
> indeed VkPhysicalDeviceFeatures has fields for both clipping and culling
> (shaderClipDistance and shaderCullDistance) along with how many from the
> fields maxClipDistances and maxCulldistances from VkPhysicalDeviceLimits.
> In addition, Metal also support clip-distance in its shading language, see
> 5.2.3.3 Vertex Function Output Attributes of the Metal 2.2 shading spec,
> https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf.
> D3D12 also support clip-distance, see the enumeration D3D12_CLIP_OR_CULL_DISTANCE_COUNT
> at https://docs.microsoft.com/en-us/windows/win32/direct3d12/constants .
> Also, saying the clip-planes is a thing of the past is quite unrealistic as
> there are a significant number of rendering algorithms that I wish to
> employ that uses them. I do however advocate for it to be ok to report that
> there are no such clip-distance values supported since it is acceptable in
> Vulkan to not support clip-distance. Many GPU's dedicate a non-trivial
> amount of silicon to implement these user-defined clip-planes and to not
> make them available when present seems far from ideal. Emulating HW-clip
> planes through compute is quite icky though (and typically involving
> atomic-ops in the compute shader) and emulating it through discard is the
> worst possible.
>
> Secondly, on the subject of advanced blend equations, I would rather that
> the feature was part of the spec from day 0 with the ability to query if it
> was supported. For UI rendering these blend modes prevent a large amount of
> terrible poorly performant options. These blend equations are already
> available as extensions in Vulkan, see
> https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VK_EXT_blend_operation_advanced.
> Leaving it for a later-extension is essentially pushing it down further
> away with which comes a higher chance it never sees the real light of day.
> The only part that is affected really is just adding those blend modes to
> the current list along with a query to report if it is supported.
>
> Next, issue (6) and (4) are VERY different. Issue (6) gives an application
> the ability to read the current value from the framebuffer at the fragment
> location, not all hardware support this, but most tiled-based architectures
> can or do (indeed a number of them do not have dedicated blending units and
> perform blending by adding an epilogue to do the blending). This feature
> can be done, with effort, for Metal on iOS (but not MacOS) but I have not
> seen an extension for Vulkan yet. In contrast, (4) is about declaring a
> value needs to never be sent back to memory (mostly an optimization for
> tiled-based renderers) but one cannot read back any values in the shader,
> instead it is for after a "sort-of-render-target-change". Feature (4) is a
> just an optimization for tiled-based renderers.
>
> Item (7), fragment shader interlock is available, on some hardware, on
> Vulkan with: VK_EXT_fragment_shader_interlock
>
>
> Lastly, a good GPU application needs to have some understanding of the GPU:
>  - is it tiled based renderer or not?
>  - what optional features are possible?
>
> For example, on a non-tiled based renderer, reading from the current
> render target is nowhere near as heavy operation as it is for a tiled based
> renderer. I advocate that exposing these elements will allow applications
> to get more performance from the GPU which is much the reason for WebGPU. I
> am all for making code portable, but GPU performance intensive applications
> (the purpose of WebGPU) needs this to get that, otherwise the gap between
> native and web will be quite large.
>
> At any rate, I will file each of these as separate issues, but I would
> like to have a discussion on these on the mailing list (or in the issues)
> out in the open. Ideally, we would here input not just from the
> implementors point of view, but also the developers point of view.
>
> Best Regards,
>  -Kevin Rogovin
>
>
>
>
> On Tue, Aug 6, 2019 at 6:39 PM Dzmitry Malyshau <dmalyshau@mozilla.com>
> wrote:
>
>> Hi Kevin,
>>
>> Thank you for writing down your (employer's) use cases!
>>
>> Ideally, these would need to be filed as issues on
>> https://github.com/gpuweb/gpuweb/issues .
>>
>> 1. Needs an investigation to be done (see others  -
>> https://github.com/gpuweb/gpuweb/labels/investigation). Roughly
>> speaking, this is very useful and IIRC widely supported, we should have
>> it in the API.
>>
>> 2. User clip planes are the thing of the past, found in none of our
>> target APIs (Vulkan, D3D12, Metal). Therefore, I don't think this
>> feature should influence WebGPU spec.
>>
>> 3. This appears to only be supported in Vulkan (of the 3 APIs we target)
>> and provides only a minor benefit (unless you have numbers to show
>> otherwise?). Perhaps, this would work as a small extension, but it
>> doesn't seem necessary for MVP or V1 of the API.
>>
>> 4 and 6. These are similar (in a sense that both are addressed by Vulkan
>> sub-passes). Finding a good model of the API that would be portable is
>> difficult. There needs to be a solid investigation followed by one or
>> more proposals before we can have this.
>>
>> 5. I don't think any of our target APIs support this, so this feature
>> can't be influencing the WebGPU spec.
>>
>> 7. Haven't looked into it. Needs an investigation done.
>>
>>
>> You are welcome to file issues and help us with investigations/proposals!
>>
>> Note that in general we are trying to not have a lot of variation in the
>> exposed device "geometry". These extra flags and capabilities make the
>> application take different code paths on different platforms, which
>> hurts the portability property of the API and makes fingerprinting easier.
>>
>> Thank you,
>>
>> Dzmitry
>>
>>
>> On 8/6/19 3:55 AM, Kevin Rogovin wrote:
>> > Hi,
>> >
>> >  I have a number of feature requests which are quite important for my
>> > employer's use cases.
>> >
>> > First the easiest ones:
>> >
>> > 1. Dual source blending, i.e. add the blend modes: "src1-color",
>> > "one-minus-src1-color", "src1-alpha", "one-minus-src1-alpha",
>> > "src1-alpha-saturated". Each of these has a direct analogue in Vulkan,
>> > Metal and Direct3D12.
>> >
>> > 2. Add Hw-clip-planes where a query states how many hardware
>> > clip-planes are supported. It is OK if the return value is 0. In
>> > particular, if the GPU does not support HW-clip planes from its API,
>> > it should return 0. I have quite a few cases where knowing if HW-clip
>> > planes are available can change my rendering strategy and improve GPU
>> > efficiency significantly. Lastly, using discard to emulate HW-clip
>> > planes can have large, negative performance impact and is something I
>> > (and others) should avoid.
>> >
>> > 3. Derived pipeline state objects. Not all of the targeted API's have
>> > this feature, but those that do, like Vulkan, it can help. The main
>> > use case is again that if two PSO's are quite similar, then a driver
>> > can upload only the parts are different and compute in advance what
>> > those parts that are different.
>> >
>> > Now the tricky ones which require significant thought to properly do:
>> >
>> > 4. Render passes with local storage. This was something that was
>> > non-trivial in Vulkan I admit but the potential usefulness is
>> > significant. The basic idea is the ability to declare a value in the
>> > frag-shader as intermediate to be read from the exact same pixel
>> > location in a later rendering pass. The big use case is for tile based
>> > renderers so that temporary data is never sent out to memory. This
>> > gives a large performance and power-saving boost for deferred
>> > rendering strategies.
>> >
>> > And lastly, features that not all GPU's can do, but are game changers:
>> >
>> > 5. To *optionally* support the blend modes of khr-blend-equations
>> > advanced. I just want the API to have a query to ask if it is there
>> > and as extensions rollout for Vulkan or ability to emulate with Metal
>> > as found in iOS, to use this feature if the GPU supports it. On the
>> > desktop two of the three major GPU providers have hardware support for
>> > this feature. Of the mobile GPU's I think most have this in their GLES
>> > implementations.
>> >
>> > 6. For tile based renderers, the ability to read the "last" value of
>> > the framebuffer at the fragment, something akin to
>> > GL_EXT_shader_framebuffer_fetch. Again, not to require this feature,
>> > but the ability to query it. Most tiled based renderers can support
>> > this on some level and on the desktop, two of the three can either do
>> > or emulate this feature. For a variety of situations, this can be a
>> > game changer to improve performance as well. On mobile, I know that
>> > atleast 3 of the GPU lines out there support or can support this
>> feature.
>> >
>> > 7. Another useful feature is an analogue of
>> > GL_ARB_fragment_shader_interlock; again two of the three desktop GPU's
>> > have HW support for this feature. For a variety of situations, this
>> > can be a game changer to improve performance as well.
>> >
>> > I would like to participate in the discussions, not just drop the
>> > above wish list. I.e. I want to help make any, or all, of the above
>> > land in WebGPU.
>> >
>> > Best Regards,
>> >  -Kevin Rogovin
>>
>>
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Tuesday, 6 August 2019 18:16:46 UTC