Re: Advanced blend equations [was: Some Feature requests.]

On Tue, Aug 6, 2019 at 1:22 PM Kevin Rogovin <kevinrogovin@invisionapp.com>
wrote:

> Hi,
>
> For the case of the advanced blend equations, all we are talking is adding
> the enumerations to the current list and a query to ask if they are
> available. The initial implementations I would expect to just report NOT
> supporting that blend mode, but by having it *there* it more likely to get
> implemented and used.
>

The way that we plan to do WebGPU extensions, I think these are actually
identical:
- Extension functionality is added inline on the same objects, not on
separate extension objects like in WebGL.
- Extensions must be requested at device creation time (same as how
capability flags would work).

So extensions will really just be a list of capability flags where
additional flags can be added to the spec post 1.0 (and added to browsers
as they have time to implement them).

The only difference is that I think you want the extensions in the spec
before 1.0. I see no benefit in doing this. Since it's optional, browsers
may not implement it right away anyway. It seems to me that all optional
functionality can be equally well added earlier or later and should be done
through the same mechanism, instead of having fragmentation where some
things are behind capability flags now and some other things are behind
extension flags later.

Also, I (personally) think that we might be able to write extensions inline
into the main spec so they won't be disjoint from it, but we have not had
experience with this yet.


> On the subject of extensions, one awful pain point for me has been that
> the w3c extensions for WebGL1 and WebGL2 lagged terribly, terribly behind
> the extensions available for GLES2 and GLES3. The worst ones for GLES2 (and
> there is no WebGL1 extension) is support for non-power-2 texture mipmapping
> inspite of a GLES2 extension being available and widely supported across
> hardware. A fair number of good GLES3 extensions also never made their way
> to WebGL2 inspite of wide hardware support. Given that extension history, I
> much rather have a query for "easy-things-to-specify" rather than an
> extension.
>
Please make feature requests for WebGL extensions on GitHub
<https://github.com/KhronosGroup/WebGL> or webgl-dev-list
<https://groups.google.com/forum/#!forum/webgl-dev-list>. There are many GL
and ES extensions which are not very used, so if we don't hear these
requests then we can't know to consider them.

By placing features that are not too difficult to add to the spec together
> with a query mechanism these features are far more likely to be used. The
> issue with extensions is often chicken-and-egg: if the extension is not
> supported a work-around is made (typically at performance cost) and a
> developer moves on leaving performance on the floor waiting for the
> extension to be ratified and in turn adopted.
>
Because of the way we will be doing extensions, as long as we hear the
request, I don't think it will take any more time to add it as an extension
than it would to add it to the core API pre-1.0, while adding it pre-1.0
will still delay 1.0.

From my point of view, the advanced blend equations and clip-distance
> features fall into the category of being easy to specify and easy to add a
> query. The idea of framebuffer-fetch (which I freely admit is mostly
> targeted at mobile) is also quite easy to specify and query as well (in my
> opinion), but I can see some portions where people can be edgy on such a
> feature. Fragment shader interlock is also not very complicated to state,
> but I can see that if not spec'd nicely can leave it quite challenging to
> make it possible to translate to native shaders supporting that feature.
>
> I also advocate a query mechanism that would describe the GPU. That is far
> easier than for an application to go through the work of testing the GPU at
> app-startup. FWIW, I need only a few seconds (burning battery though) to
> figure out if a GPU is tiled based or not, along with if its PowerVR, Mali
> or Qualcomm (I am not familiar with Apple's GPU's differences to tell it
> apart because I have not worked with it much). With longer testing, I can
> also likely figure out if a GPU is NVIDIA, Intel or AMD along with
> estimates of its FLOPS and bandwidth. Better to just add a query to let the
> GPUWeb-app query some of this info instead of benching the GPU to do
> pointless things. The query would be essentially akin to GL's GL_RENDERER
> and GL_VENDOR strings (there are analogues in Vulkan as well) along with if
> it is going through D3D12, Vulkan, Metal or something else. Knowing the GPU
> can be important on getting the most out of compute (there is a
> GPU-specificness on what are the ideal sizes of compute work groups and
> such, indeed a number of benchmarks even do a little tuning before
> launching to get a better compute shader dispatch pattern).
>

> The beauty about WebGPU is that to get native performance out of the GPU
> only requires the ability to send the GPU commands (since the GPU is doing
> the work) and the ability to know the GPU for platform specific
> optimizations (that are important often enough).
>
> -Kevin
>
> On Tue, Aug 6, 2019 at 9:16 PM Kai Ninomiya <kainino@google.com> wrote:
>
>> > Leaving it for a later-extension is essentially pushing it down further
>> away with which comes a higher chance it never sees the real light of day.
>> The only part that is affected really is just adding those blend modes to
>> the current list along with a query to report if it is supported.
>>
>> There is a trade-off here, though. The more we pack into the original
>> spec, the longer it is going to take to finish and the more behind native
>> APIs we will be by the time we ship. I am also confident that we will be
>> able to get this enabled as an extension after we ship MVP/1.0, as long as
>> we have an issue open about it and there are still customers who need it.
>>
>> On Tue, Aug 6, 2019 at 10:14 AM Dzmitry Malyshau <dmalyshau@mozilla.com>
>> wrote:
>>
>>> Hi Kevin,
>>>
>>> Thanks for correcting me on the clip distance! Apparently, it's not as
>>> much that clip-distance is outdated, as it's my knowledge about it:)
>>>
>>> The supporting links you have provided are good material for a future
>>> investigation issue.
>>>
>>> Regards,
>>>
>>> Dzmitry
>>>
>>>
>>> On 8/6/19 12:48 PM, Kevin Rogovin wrote:
>>>
>>> Hi,
>>>
>>>  Thank you for the fast response. I will file issues separately, but I
>>> will share my thoughts on the reply.
>>>
>>> Firstly, Vulkan does support HW-clip planes,
>>> indeed VkPhysicalDeviceFeatures has fields for both clipping and culling
>>> (shaderClipDistance and shaderCullDistance) along with how many from the
>>> fields maxClipDistances and maxCulldistances from VkPhysicalDeviceLimits.
>>> In addition, Metal also support clip-distance in its shading language, see
>>> 5.2.3.3 Vertex Function Output Attributes of the Metal 2.2 shading spec,
>>> https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf.
>>> D3D12 also support clip-distance, see the enumeration D3D12_CLIP_OR_CULL_DISTANCE_COUNT
>>> at https://docs.microsoft.com/en-us/windows/win32/direct3d12/constants .
>>> Also, saying the clip-planes is a thing of the past is quite unrealistic as
>>> there are a significant number of rendering algorithms that I wish to
>>> employ that uses them. I do however advocate for it to be ok to report that
>>> there are no such clip-distance values supported since it is acceptable in
>>> Vulkan to not support clip-distance. Many GPU's dedicate a non-trivial
>>> amount of silicon to implement these user-defined clip-planes and to not
>>> make them available when present seems far from ideal. Emulating HW-clip
>>> planes through compute is quite icky though (and typically involving
>>> atomic-ops in the compute shader) and emulating it through discard is the
>>> worst possible.
>>>
>>> Secondly, on the subject of advanced blend equations, I would rather
>>> that the feature was part of the spec from day 0 with the ability to query
>>> if it was supported. For UI rendering these blend modes prevent a large
>>> amount of terrible poorly performant options. These blend equations are
>>> already available as extensions in Vulkan, see
>>> https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VK_EXT_blend_operation_advanced.
>>> Leaving it for a later-extension is essentially pushing it down further
>>> away with which comes a higher chance it never sees the real light of day.
>>> The only part that is affected really is just adding those blend modes to
>>> the current list along with a query to report if it is supported.
>>>
>>> Next, issue (6) and (4) are VERY different. Issue (6) gives an
>>> application the ability to read the current value from the framebuffer at
>>> the fragment location, not all hardware support this, but most tiled-based
>>> architectures can or do (indeed a number of them do not have dedicated
>>> blending units and perform blending by adding an epilogue to do the
>>> blending). This feature can be done, with effort, for Metal on iOS (but not
>>> MacOS) but I have not seen an extension for Vulkan yet. In contrast, (4) is
>>> about declaring a value needs to never be sent back to memory (mostly an
>>> optimization for tiled-based renderers) but one cannot read back any values
>>> in the shader, instead it is for after a "sort-of-render-target-change".
>>> Feature (4) is a just an optimization for tiled-based renderers.
>>>
>>> Item (7), fragment shader interlock is available, on some hardware, on
>>> Vulkan with: VK_EXT_fragment_shader_interlock
>>>
>>>
>>> Lastly, a good GPU application needs to have some understanding of the
>>> GPU:
>>>  - is it tiled based renderer or not?
>>>  - what optional features are possible?
>>>
>>> For example, on a non-tiled based renderer, reading from the current
>>> render target is nowhere near as heavy operation as it is for a tiled based
>>> renderer. I advocate that exposing these elements will allow applications
>>> to get more performance from the GPU which is much the reason for WebGPU. I
>>> am all for making code portable, but GPU performance intensive applications
>>> (the purpose of WebGPU) needs this to get that, otherwise the gap between
>>> native and web will be quite large.
>>>
>>> At any rate, I will file each of these as separate issues, but I would
>>> like to have a discussion on these on the mailing list (or in the issues)
>>> out in the open. Ideally, we would here input not just from the
>>> implementors point of view, but also the developers point of view.
>>>
>>> Best Regards,
>>>  -Kevin Rogovin
>>>
>>>
>>>
>>>
>>> On Tue, Aug 6, 2019 at 6:39 PM Dzmitry Malyshau <dmalyshau@mozilla.com>
>>> wrote:
>>>
>>>> Hi Kevin,
>>>>
>>>> Thank you for writing down your (employer's) use cases!
>>>>
>>>> Ideally, these would need to be filed as issues on
>>>> https://github.com/gpuweb/gpuweb/issues .
>>>>
>>>> 1. Needs an investigation to be done (see others  -
>>>> https://github.com/gpuweb/gpuweb/labels/investigation). Roughly
>>>> speaking, this is very useful and IIRC widely supported, we should have
>>>> it in the API.
>>>>
>>>> 2. User clip planes are the thing of the past, found in none of our
>>>> target APIs (Vulkan, D3D12, Metal). Therefore, I don't think this
>>>> feature should influence WebGPU spec.
>>>>
>>>> 3. This appears to only be supported in Vulkan (of the 3 APIs we
>>>> target)
>>>> and provides only a minor benefit (unless you have numbers to show
>>>> otherwise?). Perhaps, this would work as a small extension, but it
>>>> doesn't seem necessary for MVP or V1 of the API.
>>>>
>>>> 4 and 6. These are similar (in a sense that both are addressed by
>>>> Vulkan
>>>> sub-passes). Finding a good model of the API that would be portable is
>>>> difficult. There needs to be a solid investigation followed by one or
>>>> more proposals before we can have this.
>>>>
>>>> 5. I don't think any of our target APIs support this, so this feature
>>>> can't be influencing the WebGPU spec.
>>>>
>>>> 7. Haven't looked into it. Needs an investigation done.
>>>>
>>>>
>>>> You are welcome to file issues and help us with
>>>> investigations/proposals!
>>>>
>>>> Note that in general we are trying to not have a lot of variation in
>>>> the
>>>> exposed device "geometry". These extra flags and capabilities make the
>>>> application take different code paths on different platforms, which
>>>> hurts the portability property of the API and makes fingerprinting
>>>> easier.
>>>>
>>>> Thank you,
>>>>
>>>> Dzmitry
>>>>
>>>>
>>>> On 8/6/19 3:55 AM, Kevin Rogovin wrote:
>>>> > Hi,
>>>> >
>>>> >  I have a number of feature requests which are quite important for my
>>>> > employer's use cases.
>>>> >
>>>> > First the easiest ones:
>>>> >
>>>> > 1. Dual source blending, i.e. add the blend modes: "src1-color",
>>>> > "one-minus-src1-color", "src1-alpha", "one-minus-src1-alpha",
>>>> > "src1-alpha-saturated". Each of these has a direct analogue in
>>>> Vulkan,
>>>> > Metal and Direct3D12.
>>>> >
>>>> > 2. Add Hw-clip-planes where a query states how many hardware
>>>> > clip-planes are supported. It is OK if the return value is 0. In
>>>> > particular, if the GPU does not support HW-clip planes from its API,
>>>> > it should return 0. I have quite a few cases where knowing if HW-clip
>>>> > planes are available can change my rendering strategy and improve GPU
>>>> > efficiency significantly. Lastly, using discard to emulate HW-clip
>>>> > planes can have large, negative performance impact and is something I
>>>> > (and others) should avoid.
>>>> >
>>>> > 3. Derived pipeline state objects. Not all of the targeted API's have
>>>> > this feature, but those that do, like Vulkan, it can help. The main
>>>> > use case is again that if two PSO's are quite similar, then a driver
>>>> > can upload only the parts are different and compute in advance what
>>>> > those parts that are different.
>>>> >
>>>> > Now the tricky ones which require significant thought to properly do:
>>>> >
>>>> > 4. Render passes with local storage. This was something that was
>>>> > non-trivial in Vulkan I admit but the potential usefulness is
>>>> > significant. The basic idea is the ability to declare a value in the
>>>> > frag-shader as intermediate to be read from the exact same pixel
>>>> > location in a later rendering pass. The big use case is for tile
>>>> based
>>>> > renderers so that temporary data is never sent out to memory. This
>>>> > gives a large performance and power-saving boost for deferred
>>>> > rendering strategies.
>>>> >
>>>> > And lastly, features that not all GPU's can do, but are game changers:
>>>> >
>>>> > 5. To *optionally* support the blend modes of khr-blend-equations
>>>> > advanced. I just want the API to have a query to ask if it is there
>>>> > and as extensions rollout for Vulkan or ability to emulate with Metal
>>>> > as found in iOS, to use this feature if the GPU supports it. On the
>>>> > desktop two of the three major GPU providers have hardware support
>>>> for
>>>> > this feature. Of the mobile GPU's I think most have this in their
>>>> GLES
>>>> > implementations.
>>>> >
>>>> > 6. For tile based renderers, the ability to read the "last" value of
>>>> > the framebuffer at the fragment, something akin to
>>>> > GL_EXT_shader_framebuffer_fetch. Again, not to require this feature,
>>>> > but the ability to query it. Most tiled based renderers can support
>>>> > this on some level and on the desktop, two of the three can either do
>>>> > or emulate this feature. For a variety of situations, this can be a
>>>> > game changer to improve performance as well. On mobile, I know that
>>>> > atleast 3 of the GPU lines out there support or can support this
>>>> feature.
>>>> >
>>>> > 7. Another useful feature is an analogue of
>>>> > GL_ARB_fragment_shader_interlock; again two of the three desktop
>>>> GPU's
>>>> > have HW support for this feature. For a variety of situations, this
>>>> > can be a game changer to improve performance as well.
>>>> >
>>>> > I would like to participate in the discussions, not just drop the
>>>> > above wish list. I.e. I want to help make any, or all, of the above
>>>> > land in WebGPU.
>>>> >
>>>> > Best Regards,
>>>> >  -Kevin Rogovin
>>>>
>>>>

Received on Wednesday, 7 August 2019 20:45:02 UTC