Re: Some Feature requests. from Kevin Rogovin on 2019-08-07 (public-gpu@w3.org from August 2019)

From: Kevin Rogovin <kevinrogovin@invisionapp.com>
Date: Wed, 7 Aug 2019 08:58:34 +0300
To: "Myles C. Maxfield" <mmaxfield@apple.com>
Cc: Doug Moen <doug@moens.org>, public-gpu <public-gpu@w3.org>
Message-ID: <CALKNkvHJUzgaFw19n_Kg4wQLbKZ3K6yJwA_TBOMk7uB7LRJYbw@mail.gmail.com>
I think that the approach of not informing an WebGPU application what is
the GPU to avoid fingerprinting is terribly naive and destructive. To keep
the discussion concrete I will give explicit ways to tell if a GPU is tiled
based or immediate based.
  1. Create two shaders:
        - the first has that its fragment shader has a little ALU in it to
give a color that varies per-fragment without doing a texture lookup.
        - have a second fragment shader that is a simple blit.
        - make blend active
  2. Run each of these shaders in an offscreen buffer to have massive
non-localized overdraw (just make a single triangle cover the screen and
draw it alot). Tiled based renderers will have a MASSIVE performance
difference between (1) and (2), well beyond the naive expectation of a
factor of 2.
  3. There is a GPU HW technique called opportunistic triangle reordering
where a GPU reorders triangles -some- to avoid blowing the render cache. To
detect that technique, tessellate the fullscreen quad a fair amount and
scramble it a little in screen space. This will prevent that from working,
but actual tiled based renderers will not be affected.
  4. By controlling the blend operation and ALU to output be noisy or not
noisy, we can detect if an immediate based GPU has renderbuffer
compression, by specifying a noisy or not-so-noisy texture, one can also
tell if the GPU has lossless color compression for textures.

This above test is fast to execute and exceedingly easy to code. With the
above testing apparatus, one can also get a reliable picture on the read
bandwidth of the device. Restricting the texture to be tiny will allow on
to get a measure of the sampler performance. Varying that tiny texture size
will allow one to guess the sampler cache size as well and potentially its
cache hierarchy.

Further tests using image-load-store will allow one to further understand
the cache architecture as well.

The code to differentiate the big 4 tile based renderers (ARM Mali,
Imagination Technologies, Qualcomm and Apple GPU) is also quite possible
(again I know the first three really well and I am pretty sure I can write
tests to determine those) because each of these vary enough in how they do
tiled based rendering that they have different performance profiles for
very certain loads. On the front of the desktop, there are also tests to
distinguish between NVIDIA, AMD and Intel GPU's. By not having tessellation
and geometry shaders in WebGPU (which I think is the RIGHT thing) the
testing is a touch more work, but very possible (the easiest place to poke
is in compute shading and behaviour of denormalized numbers, floating point
funkiness and group size and how the GPU handles shader branching).

These tests will then also generate FLOPS and bandwidth numbers.

In short, it would be simple focused benchmarking together with some
floating point accuracy checks to identify a GPU.

So not telling about the GPU will not prevent being able to get a pretty
good idea of the GPU or for that matter finger printing it to some degree.
I am not advocating reporting driver strings, just the GPU architecture
name, model number (to classify its performance) and if it is or is not a
tiled-based GPU. By doing this, WebGPU applications can know at startup
what to do with the GPU instead of eating oodles of battery and increasing
load times to figure it out.

Best Regards,
 -Kevin



On Wed, Aug 7, 2019 at 5:35 AM Myles C. Maxfield <mmaxfield@apple.com>
wrote:

>
>
> > On Aug 6, 2019, at 9:42 AM, Doug Moen <doug@moens.org> wrote:
> >
> > As the sole proprietor of a small open source project, my goals are
> different from Kevin's. I am most interested in portability, followed by
> ease of use.
> >
> > I would like to use portable, easy to use abstractions that encapsulate
> GPU design patterns where you must write different code for different
> platforms. In some cases, experts will implement a library on top of WebGPU
> that provides these abstractions, and in other cases, portable abstractions
> are built in to WebGPU.
> >
> > I do not think that "making fingerprinting harder" should be a goal for
> the WebGPU group. I think it is impossible to prevent fingerprinting by
> limiting the GPU features that are exposed by the WebGPU API.
>
> We’ve heard this defeatist argument before and entirely disagree with it.
> All sources of fingerprinting are things we desire to eliminate from our
> browser. Just because one attack vector is possible today doesn’t mean it
> will be possible tomorrow.
>
> It comes down to our users' expectations and desires. Users don’t want to
> be fingerprinted. Our job is to make that come true.
>
> > I wouldn't be surprised if you could fingerprint a GPU simply by testing
> edge conditions in the output of transcendental functions in a shader, or
> by using some other technique that it is impossible for this group to
> protect against.
>
> Luckily, the shader compiler is inside the browser, so if transcendental
> functions are the problem, we can fix them.
>
> > If GPU fingerprinting is possible at all, somebody skilled at the art
> will find it, write a library, and sell it to people who want to
> fingerprint browsers. There's big money in this. How easy it is to write
> the code is irrelevant, once libraries exist on the open market.
>
> We have a long history of shipping fixes and mitigations when this occurs.
>
> >
> > I think the only way to stop GPU fingerprinting is to ensure that there
> is no way for information obtained from WebGPU to leak into code that has
> the ability to transmit data to the internet. One approach might be some
> form of sandboxing. Another approach might be to add "taint checking" to
> the Javascript language (Perl has this feature).
>
> Yep, this is one way to attack the problem.
>
> >
> > I think this group should consult somebody with deep browser security
> expertise to provide advice on what (if anything) the WebGPU group should
> do about fingerprinting. For example, the people who write the Tor browser
> would have the necessary expertise. I perceive the Tor browser as having
> the best anti-fingerprinting technology, so if there are changes to WebGPU
> that Tor needs, then they are worth considering. Ideally there would be a
> w3c browser anti-fingerprinting team that this group could interact with.
> >
> > If you don't have to worry about fingerprinting, then you can provide
> APIs for querying GPU capabilities. Experts can use these GPU query APIs to
> build libraries of portable, easy to use abstractions on top of WebGPU that
> people like me can use to write portable, high level GPU code. My main
> concern is that the existence of GPU query APIs might increase the
> likelyhood that code I write doesn't have portable behaviour, but I will
> defer to GPU experts to assess this risk on a case by case basis.
> >
> > Doug Moen.
> >
> > On Tue, Aug 6, 2019, at 11:39 AM, Dzmitry Malyshau wrote:
> >> Hi Kevin,
> >>
> >> Thank you for writing down your (employer's) use cases!
> >>
> >> Ideally, these would need to be filed as issues on
> >> https://github.com/gpuweb/gpuweb/issues .
> >>
> >> 1. Needs an investigation to be done (see others  -
> >> https://github.com/gpuweb/gpuweb/labels/investigation). Roughly
> >> speaking, this is very useful and IIRC widely supported, we should have
> >> it in the API.
> >>
> >> 2. User clip planes are the thing of the past, found in none of our
> >> target APIs (Vulkan, D3D12, Metal). Therefore, I don't think this
> >> feature should influence WebGPU spec.
> >>
> >> 3. This appears to only be supported in Vulkan (of the 3 APIs we
> target)
> >> and provides only a minor benefit (unless you have numbers to show
> >> otherwise?). Perhaps, this would work as a small extension, but it
> >> doesn't seem necessary for MVP or V1 of the API.
> >>
> >> 4 and 6. These are similar (in a sense that both are addressed by
> Vulkan
> >> sub-passes). Finding a good model of the API that would be portable is
> >> difficult. There needs to be a solid investigation followed by one or
> >> more proposals before we can have this.
> >>
> >> 5. I don't think any of our target APIs support this, so this feature
> >> can't be influencing the WebGPU spec.
> >>
> >> 7. Haven't looked into it. Needs an investigation done.
> >>
> >>
> >> You are welcome to file issues and help us with
> investigations/proposals!
> >>
> >> Note that in general we are trying to not have a lot of variation in
> the
> >> exposed device "geometry". These extra flags and capabilities make the
> >> application take different code paths on different platforms, which
> >> hurts the portability property of the API and makes fingerprinting
> easier.
> >>
> >> Thank you,
> >>
> >> Dzmitry
> >>
> >>
> >> On 8/6/19 3:55 AM, Kevin Rogovin wrote:
> >>> Hi,
> >>>
> >>>  I have a number of feature requests which are quite important for my
> >>> employer's use cases.
> >>>
> >>> First the easiest ones:
> >>>
> >>> 1. Dual source blending, i.e. add the blend modes: "src1-color",
> >>> "one-minus-src1-color", "src1-alpha", "one-minus-src1-alpha",
> >>> "src1-alpha-saturated". Each of these has a direct analogue in Vulkan,
> >>> Metal and Direct3D12.
> >>>
> >>> 2. Add Hw-clip-planes where a query states how many hardware
> >>> clip-planes are supported. It is OK if the return value is 0. In
> >>> particular, if the GPU does not support HW-clip planes from its API,
> >>> it should return 0. I have quite a few cases where knowing if HW-clip
> >>> planes are available can change my rendering strategy and improve GPU
> >>> efficiency significantly. Lastly, using discard to emulate HW-clip
> >>> planes can have large, negative performance impact and is something I
> >>> (and others) should avoid.
> >>>
> >>> 3. Derived pipeline state objects. Not all of the targeted API's have
> >>> this feature, but those that do, like Vulkan, it can help. The main
> >>> use case is again that if two PSO's are quite similar, then a driver
> >>> can upload only the parts are different and compute in advance what
> >>> those parts that are different.
> >>>
> >>> Now the tricky ones which require significant thought to properly do:
> >>>
> >>> 4. Render passes with local storage. This was something that was
> >>> non-trivial in Vulkan I admit but the potential usefulness is
> >>> significant. The basic idea is the ability to declare a value in the
> >>> frag-shader as intermediate to be read from the exact same pixel
> >>> location in a later rendering pass. The big use case is for tile based
> >>> renderers so that temporary data is never sent out to memory. This
> >>> gives a large performance and power-saving boost for deferred
> >>> rendering strategies.
> >>>
> >>> And lastly, features that not all GPU's can do, but are game changers:
> >>>
> >>> 5. To *optionally* support the blend modes of khr-blend-equations
> >>> advanced. I just want the API to have a query to ask if it is there
> >>> and as extensions rollout for Vulkan or ability to emulate with Metal
> >>> as found in iOS, to use this feature if the GPU supports it. On the
> >>> desktop two of the three major GPU providers have hardware support for
> >>> this feature. Of the mobile GPU's I think most have this in their GLES
> >>> implementations.
> >>>
> >>> 6. For tile based renderers, the ability to read the "last" value of
> >>> the framebuffer at the fragment, something akin to
> >>> GL_EXT_shader_framebuffer_fetch. Again, not to require this feature,
> >>> but the ability to query it. Most tiled based renderers can support
> >>> this on some level and on the desktop, two of the three can either do
> >>> or emulate this feature. For a variety of situations, this can be a
> >>> game changer to improve performance as well. On mobile, I know that
> >>> atleast 3 of the GPU lines out there support or can support this
> feature.
> >>>
> >>> 7. Another useful feature is an analogue of
> >>> GL_ARB_fragment_shader_interlock; again two of the three desktop GPU's
> >>> have HW support for this feature. For a variety of situations, this
> >>> can be a game changer to improve performance as well.
> >>>
> >>> I would like to participate in the discussions, not just drop the
> >>> above wish list. I.e. I want to help make any, or all, of the above
> >>> land in WebGPU.
> >>>
> >>> Best Regards,
> >>>  -Kevin Rogovin
> >>
> >>
> >
>
>
>
Received on Wednesday, 7 August 2019 05:59:11 UTC