Re: RFC: Re: shader IR vs. application IR from Maciej Stachowiak on 2017-08-16 (public-gpu@w3.org from August 2017)

From: Maciej Stachowiak <mjs@apple.com>
Date: Wed, 16 Aug 2017 12:08:48 -0700
To: Kenneth Russell <kbr@google.com>
Cc: Corentin Wallez <cwallez@google.com>, David Neto <dneto@google.com>, "Myles C. Maxfield" <mmaxfield@apple.com>, public-gpu@w3.org, Dean Jackson <dino@apple.com>
Message-id: <C0A4310D-50BB-45CC-BA1E-D258FA26270E@apple.com>
Responding to both Ken and Corentin:

> On Aug 16, 2017, at 11:19 AM, Kenneth Russell <kbr@google.com> wrote:
> 
> On Wed, Aug 16, 2017 at 10:06 AM, Corentin Wallez <cwallez@google.com> wrote:
> On Wed, Aug 16, 2017 at 11:48 AM, Maciej Stachowiak <mjs@apple.com> wrote:
> 
> I was prompted by off-list comments to read the SPIR-V spec and its explanation of Logical Addressing Mode in the spec here: <https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html>.
> 
> Thanks for taking an in-depth look at the logical addressing mode.
>  
> It seems like SPIR-V's Logical Addressing Mode removes many of the obvious ways to do out-of-bounds reads or writes. It stops you from storing pointers and doing arithmetic on them; and it disables some opcodes that let you read or write with a program-controlled size, or get pointers to arbitrary locations. Many of these require the "Addresses" capability which is disabled in Logical Addressing Mode.
> 
> However, the OpAccessChain operation, which lets you get a pointer to an arbitrary non-bounds-checled offset into a composite object, does not require the Addresses capability. It does not require any capability at all, which means there is no standard SPIR-V mechanism to disable or limit it. This is despite the existence of OpInBoundsAccessChain, which seems to do the same thing, but bounds-checked.[1]
> 
> My understanding is that OpInBoundsAccessChain is a hint from the application that no bounds check are necessary, but could be completely ignored.
>  
> There may be other operations that allow reading or writing memory without bounds checking. For example, OpImageRead, OpImageWrite, and related OpImage ops are not specified to do bounds checks, at least not obviously. Another example: OpVectorExtractDynamic, OpVectorInsertDynamic and related ops are not defined to bounds check, and explicitly specify that what is read or written is undefined if the index is out of bounds.
> 
> 
> Maybe I'm missing a simple reason why these ops are actually safe, or an easy way to add bounds checking. But I tentatively conclude that SPIR-V does not provide the required level of memory safety out of the box, not even in Logical Addressing Mode. 
> 
> Good catch for the OpImage* and OpVector*Dynamic. All of the ops mentioned above are not safe because they access memory at a dynamic offset. The capabilities they provide are critical for a GPU language: only being able to access images and buffers at compile-time constant offsets would be too restrictive. In that respect SPIR-V is no worse than any other language we could choose, and has the nice property that in the logical addressing mode, only a dozen ops might be unsafe.

It is possible to design a language that has bounds checking semantics on all operations that read from a dynamic offset, with the compiler responsible for eliding bounds checks that are provably not needed. Many such languages exist. The penalty is that it's hard (maybe impossible?) to do bounds checking without a runtime performance penalty. So with respect to memory safety, SPIR-V *is* worse than many other languages we could choose. Of course, in other respects it is good: it's already thought through the operations that shaders actually need. But memory safety would have to be added on.

> 
> The arguments to the image texel fetch operations would need to be clamped inside the shader, in the same way that buffer accesses might need to be, depending on the guarantees that the underlying APIs provide, and the decisions that are made about testability of WebGPU's memory safety. If for example WebGPU's semantics are defined as yielding either zero, or the value fetched at the clamped indices, for any out-of-range accesses against a buffer or image, and an underlying API like Direct3D 12 already provides those semantics, no additional bounds checks would need to be added to the compiled shader. If on the other hand the underlying API provides no guarantees, then the shader would need to be transformed during loading to fetch the size of the image using OpImageQuerySize and clamp the indices to that range.

Good point. In many cases, a rewrite step could add bounds checking.

Do all non-bounds-checked operations in SPIR-V have a way to get a corresponding size? I'm not sure of the right size operation(s) to pair with OpAccessChain or OpInBoundsAccessChain.

There's probably other SPIR-V instructions with the potential for out-of-bounds access. I only did a pretty quick scan of the spec and I'm not much of an expert on shaders or security. Yet I was able to identify 5 unsafe instructions in addition to the one Corentin mentioned to me.

> 
> Note also that the typical way of sampling a texture is via OpImageSample*, which is guaranteed to be safe.

Noted. But for security purposes, you need all ways to be safe, not just the most common way.

> 
>  
> Therefore, as far as I can tell, my argument still stands. I believe we would need to significantly modify the SPIR-V format and SPIR-V implementations to achieve full memory safety. That's not to say it can't be done, but I don't think we should expect the resulting format to work with stock SPIR-V tools.
> 
> I disagree with this conclusion. It might be necessary to transform incoming shaders to insert clamping of some indices, or perhaps another approach would be needed. I disagree firmly with your conclusion that this would require significant modification of the SPIR-V format and toolchain to achieve. Apple contributed similar code to ANGLE's shader translator! See https://chromium.googlesource.com/angle/angle/+/master/src/third_party/compiler/ArrayBoundsClamper.cpp . This was a solid piece of work but didn't involve *any* modification of the GLES 1.00 spec which WebGL 1.0 shaders use essentially verbatim. It seems likely to me that the same properties would be achievable if SPIR-V were the shader format ingested by WebGPU implementations.

It might be, if all access operations have corresponding length operations.

> 
> Why don't we work on enumerating and documenting the security issues for shading languages/IRs that have been uncovered to date, like needing secure reads and writes for kernels' input and output buffers and texel fetches?

I think it's hard to enumerate everything that could be bad about an IR. Ultimately, you have to review every instruction and think about its semantics, and also how it might act in combination with other instructions.

Off the top of my head, here's a non-exhaustive list of possible language operations that would make it unsafe for the web:
- Any exposure of pointer values
- Pointer arithmetic
- Read or write to any location without bounds checks
- Ability to read or write another process's main or video memory
- Ability to read or write another shader's main or video memory
- Ability to read or write video memory that holds the contents of a webpage, even one that is same-origin with the shader
- Ability to hold a pointer or reference to an object after it may have been freed, and still perform access through it
- Potential for race conditions which may lead to access to out-of-bounds, freed, reallocated or not-yet-freed memory
- Function call or jump to an arbitrary addresses (jumping to known functions or labels is ok)
- Mixing dynamically indexed data structures, and function pointers or executable code in the same memory space
- Ability to execute an infinite loop without ever being interrupted by a watchdog or the like
- Ability to invoke privileged operations provided by the OS or hardware

>From a quick read, it seems like some of these are impossible in SPIR-V in logical addressing mode, some are definitely possible, and for some I don't know.

> Focusing on dismissing a potentially viable representation like SPIR-V at this point is like throwing the baby out with the bath water. That spec has had a ton of thought and experience put into it (as have DXIL and MSL) and aiming to dismiss it out of hand is counterproductive to the entire discussion.

I don't want to dismiss SPIR-V. I wanted to make the point that, as specified, it does not provide the required memory safety. Usually, I tend to think that systems designed for security from the ground up tend to be more robust than systems which lack the desired security property and must have it bolted on. But bolting on security is not impossible; as you point out, it's been done for GLSL. I may have overstated the case on how much would need to change to add memory safety.

I hope my comments so far have at least shown that this problem is more complicated than one might assume without careful study.

FWIW I do not believe DXIL or MSL provide the required level of memory safety out of the box either. I have no particular reason to advocate for them over SPIR-V, at least not on the basis of security considerations.

Regards,
Maciej
Received on Wednesday, 16 August 2017 19:09:21 UTC