Re: RFC: Re: shader IR vs. application IR from Kenneth Russell on 2017-08-16 (public-gpu@w3.org from August 2017)

From: Kenneth Russell <kbr@google.com>
Date: Wed, 16 Aug 2017 11:19:02 -0700
To: Corentin Wallez <cwallez@google.com>
Cc: Maciej Stachowiak <mjs@apple.com>, David Neto <dneto@google.com>, "Myles C. Maxfield" <mmaxfield@apple.com>, public-gpu@w3.org, Dean Jackson <dino@apple.com>
Message-ID: <CAMYvS2fFJw8F=vbtHoYu4WMdrnPFftx8aVmprxrOTM3HiqLHsQ@mail.gmail.com>
On Wed, Aug 16, 2017 at 10:06 AM, Corentin Wallez <cwallez@google.com>
wrote:

> On Wed, Aug 16, 2017 at 11:48 AM, Maciej Stachowiak <mjs@apple.com> wrote:
>
>>
>> I was prompted by off-list comments to read the SPIR-V spec and its
>> explanation of Logical Addressing Mode in the spec here: <
>> https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html>.
>>
>> Thanks for taking an in-depth look at the logical addressing mode.
>
>
>> It seems like SPIR-V's Logical Addressing Mode removes many of the
>> obvious ways to do out-of-bounds reads or writes. It stops you from storing
>> pointers and doing arithmetic on them; and it disables some opcodes that
>> let you read or write with a program-controlled size, or get pointers to
>> arbitrary locations. Many of these require the "Addresses" capability which
>> is disabled in Logical Addressing Mode.
>>
>> However, the OpAccessChain operation, which lets you get a pointer to an
>> arbitrary non-bounds-checled offset into a composite object, does *not*
>> require the Addresses capability. It does not require any capability at
>> all, which means there is no standard SPIR-V mechanism to disable or limit
>> it. This is despite the existence of OpInBoundsAccessChain, which seems to
>> do the same thing, but bounds-checked.[1]
>>
>> My understanding is that OpInBoundsAccessChain is a hint from the
> application that no bounds check are necessary, but could be completely
> ignored.
>
>
>> There may be other operations that allow reading or writing memory
>> without bounds checking. For example, OpImageRead, OpImageWrite, and
>> related OpImage ops are not specified to do bounds checks, at least not
>> obviously. Another example: OpVectorExtractDynamic, OpVectorInsertDynamic
>> and related ops are not defined to bounds check, and explicitly specify
>> that what is read or written is undefined if the index is out of bounds.
>>
>>
>> Maybe I'm missing a simple reason why these ops are actually safe, or an
>> easy way to add bounds checking. But I tentatively conclude that SPIR-V
>> does *not* provide the required level of memory safety out of the box,
>> not even in Logical Addressing Mode.
>>
>> Good catch for the OpImage* and OpVector*Dynamic. All of the ops
> mentioned above are not safe because they access memory at a dynamic
> offset. The capabilities they provide are critical for a GPU language: only
> being able to access images and buffers at compile-time constant offsets
> would be too restrictive. In that respect SPIR-V is no worse than any other
> language we could choose, and has the nice property that in the logical
> addressing mode, only a dozen ops might be unsafe.
>

The arguments to the image texel fetch operations would need to be clamped
inside the shader, in the same way that buffer accesses might need to be,
depending on the guarantees that the underlying APIs provide, and the
decisions that are made about testability of WebGPU's memory safety. If for
example WebGPU's semantics are defined as yielding either zero, or the
value fetched at the clamped indices, for any out-of-range accesses against
a buffer or image, and an underlying API like Direct3D 12 already provides
those semantics, no additional bounds checks would need to be added to the
compiled shader. If on the other hand the underlying API provides no
guarantees, then the shader would need to be transformed during loading to
fetch the size of the image using OpImageQuerySize and clamp the indices to
that range.

Note also that the typical way of sampling a texture is via OpImageSample*,
which is guaranteed to be safe.



> Therefore, as far as I can tell, my argument still stands. I believe we
>> would need to significantly modify the SPIR-V format and SPIR-V
>> implementations to achieve full memory safety. That's not to say it can't
>> be done, but I don't think we should expect the resulting format to work
>> with stock SPIR-V tools.
>>
>
I disagree with this conclusion. It might be necessary to transform
incoming shaders to insert clamping of some indices, or perhaps another
approach would be needed. I disagree firmly with your conclusion that this
would require significant modification of the SPIR-V format and toolchain
to achieve. Apple contributed similar code to ANGLE's shader translator!
See
https://chromium.googlesource.com/angle/angle/+/master/src/third_party/compiler/ArrayBoundsClamper.cpp
. This was a solid piece of work but didn't involve *any* modification of
the GLES 1.00 spec which WebGL 1.0 shaders use essentially verbatim. It
seems likely to me that the same properties would be achievable if SPIR-V
were the shader format ingested by WebGPU implementations.

Why don't we work on enumerating and documenting the security issues for
shading languages/IRs that have been uncovered to date, like needing secure
reads and writes for kernels' input and output buffers and texel fetches?
Focusing on dismissing a potentially viable representation like SPIR-V at
this point is like throwing the baby out with the bath water. That spec has
had a ton of thought and experience put into it (as have DXIL and MSL) and
aiming to dismiss it out of hand is counterproductive to the entire
discussion.

-Ken



>> Regards,
>> Maciej
>>
>>
>> [1] I'm not absolutely sure that OpInBoundsAccessChain is bounds-checked
>> because the definition is vague. It says: "Has the same semantics
>> as OpAccessChain, with the addition that the resulting pointer is known to
>> point within the base object." Does "known to" mean that the implementation
>> bounds-checks and clamps the pointer to be within bounds (presumably with
>> enough space left for the pointed-to type)? Or does it mean that the caller
>> of this opcode promises that the pointer is within bounds? I'm assuming the
>> former but it's not entirely clear.
>>
>>
>> On Aug 15, 2017, at 2:41 PM, Maciej Stachowiak <mjs@apple.com> wrote:
>>
>>
>>
>> On Aug 15, 2017, at 2:29 PM, Kenneth Russell <kbr@google.com> wrote:
>>
>> On Tue, Aug 15, 2017 at 1:57 PM, Maciej Stachowiak <mjs@apple.com> wrote:
>>
>>>
>>> Let's say we started with SPIR-V to make WebSPIR-V with at least the
>>> following changes:
>>>
>>> - Some different APIs and interfaces are exposed than you would expect
>>> in a WebGL or Vulkan shader.
>>> - At compile-time we generate code with runtime bounds checks and other
>>> safety features.
>>> - We possibly subset SPIR-V to remove dangerous capabilities that can't
>>> be effectively guarded with bounds checks or other runtime checks.
>>>
>>> If we did all these things, then it seems to me we wouldn't benefit from
>>> the existing SPIR-V ecosystem:
>>> - Existing SPIR-V shaders could not be reused with WebGPU
>>> - Front ends that compile to SPIR-V could not be used unmodified
>>> - Existing SPIR-V back ends could not correctly process the modified
>>> language, since they wouldn't have the ability to do runtime checks.[1]
>>>
>>> Is my analysis correct? If so, then what is the benefit of basing on
>>> SPIR-V?
>>>
>>
>> This analysis is incorrect and oversimplifies the issues in order to cast
>> SPIR-V in a negative light.
>>
>> The problematic areas are mainly in bounds-checking accesses in the input
>> and output buffers to compute shaders. Pipeline stages like vertex,
>> geometry, and fragment shaders are well and securely handled by the
>> capabilities SPIR-V exposes, since it was designed as a shading language
>> for graphics APIs.
>>
>>
>> My point isn't to make SPIR-V look bad. I don't believe there's any
>> shader language that meets all our requirements as-is.
>>
>> My intent is only to argue that it shouldn't have a leg up based on
>> ecosystem considerations, because we'd likely have to modify it enough that
>> the existing SPIR-V ecosystem won't be compatible with the end result.
>> Despite your disagreement, I still think that's probably true.
>>
>> I'm not clear on how SPIR-V avoids having a bounds checking problem for
>> shader types other than compute shaders. Are they unable to use the
>> non-bounds-checked operations that compute shaders can use? Do they only
>> get access to fixed-size arrays? Do they only get range-checked buffer
>> operations? To be clear, I would consider anything that can be used to read
>> or write arbitrary locations in video memory to be a security risk. Even if
>> it's limited to the video memory of a specific process.
>>
>> If I'm missing something that makes this a non-issue, please explain.
>>
>> Instead of attempting to dismiss one particular solution a priori, what
>> should be done is to enumerate the needed capabilities of the various kinds
>> of shaders in WebGPU, and see what would need to be done to the starting
>> point (Metal Shading Language, WebAssembly, DXIL, SPIR-V, a higher-level
>> language, something brand new...) in order to have it be secure and
>> performant. There are multiple interesting analyses, designs and
>> comparisons to be done.
>>
>>
>> I agree that this should be the approach. I did not mean to dismiss
>> SPIR-V, merely to argue that it should not be considered to have any
>> particular advantage over the other options. Except perhaps on intrinsic
>> technical merit, which I think is the point of the overall analysis. It
>> might be that a modified version of SPIR-V is sound technical choice
>> whether or not it interoperates with regular SPIR-V.
>>
>> Regards,
>> Maciej
>>
>>
>>
>
Received on Wednesday, 16 August 2017 18:19:31 UTC