Re: RFC: Re: shader IR vs. application IR

I see some good progress on this thread.   I'll try to amplify and clarify
certain points.

You can think of SPIR-V as a *family* of shading languages with common
fundamentals. Or even a *medium* for expressing shading languages.   Much
like how most web pages are in UTF-8, you have to say more.
There are at least 3 varieties in the wild: SPIR-V for Vulkan, for OpenGL,
and OpenCL.  We can get weird and count separately every single variation
where a particular capability or extension is enabled or not, but that's a
bit obtuse.

More seriously:
There are lots of ways to generate undefined behaviour.  This is true of
any minimally general language.  I appeal to the halting problem, again.
Even division by zero is suspect, but I imagine we want to keep the
division operator.
It's up to the environment to say more about what happens in those cases,
if it wants to.  We all agree that for the web we must provide robust
safety guarantees.

It it's kind of a paradox of specs, but a specification says *nothing*
about what happens in the unspecified behaviour case.  That's what makes it
*unspecified.*

Substitute any other shading language XYZ for SPIR-V and the GPUWeb
environment still has to say more about what happens in the face of
ill-behaved programs.   In a later step we can make a technical comparison
about the ease, robustness, and efficiency of applying those behavioural
constraints.

I agree with Ken that we should make our requirements explicit.  So far
I've seen at least:
- Access with program-determined indices into resources, in various stages
of the graphics pipeline.  (E.g. general access to UBO, SSBO, images)
- Robust behaviour in the face of ill-behaved programs:  E.g. in some cases
behave as if out of bounds accesses are clamped;  in other cases maybe not,
including predictable termination of the application(?) without further
side effects or information leakage. (?)
- General arithmetic.  I want my division instruction. :-)


On a more detailed level:

- OpVectorInsertDynamic and OpVectorExtractDynamic operate on values, not
memory.  So they're an even simpler case than memory and image accesses.

- OpInBoundsAccessChain vs. OpAccessChain.   Yes, the InBounds part is a
hint.  More specifically the implementation does different things depending
on whether it trusts the shader:
    - In all cases program behaviour is undefined if the resulting pointer
results in an out-of-object-bounds access
    - If the implementation does not trust the shader, then the
implementation treats both instructions the same way:  It always inserts
bounds checks (runtime or by modifying the shader)
    - If the implementation trusts the shader, *and* safety is desired,
then the bounds checks can be elided in the case of OpInBoundAccessChain
    - If the implementation trusts the shader and safety is not required at
this level of the implementation, then you elide the checks in both cases.
E.g. do this if the implemention delegates safety guarantees to a more
general level of protection, e.g. MMU hardware and its OS support

- Effectively, there are no function pointers in SPIR-V, due to validation
rules



On Wed, Aug 16, 2017 at 2:19 PM Kenneth Russell <kbr@google.com> wrote:

> On Wed, Aug 16, 2017 at 10:06 AM, Corentin Wallez <cwallez@google.com>
> wrote:
>
>> On Wed, Aug 16, 2017 at 11:48 AM, Maciej Stachowiak <mjs@apple.com>
>> wrote:
>>
>>>
>>> I was prompted by off-list comments to read the SPIR-V spec and its
>>> explanation of Logical Addressing Mode in the spec here: <
>>> https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html>.
>>>
>>> Thanks for taking an in-depth look at the logical addressing mode.
>>
>>
>>> It seems like SPIR-V's Logical Addressing Mode removes many of the
>>> obvious ways to do out-of-bounds reads or writes. It stops you from storing
>>> pointers and doing arithmetic on them; and it disables some opcodes that
>>> let you read or write with a program-controlled size, or get pointers to
>>> arbitrary locations. Many of these require the "Addresses" capability which
>>> is disabled in Logical Addressing Mode.
>>>
>>> However, the OpAccessChain operation, which lets you get a pointer to an
>>> arbitrary non-bounds-checled offset into a composite object, does *not*
>>> require the Addresses capability. It does not require any capability at
>>> all, which means there is no standard SPIR-V mechanism to disable or limit
>>> it. This is despite the existence of OpInBoundsAccessChain, which seems to
>>> do the same thing, but bounds-checked.[1]
>>>
>>> My understanding is that OpInBoundsAccessChain is a hint from the
>> application that no bounds check are necessary, but could be completely
>> ignored.
>>
>>
>>> There may be other operations that allow reading or writing memory
>>> without bounds checking. For example, OpImageRead, OpImageWrite, and
>>> related OpImage ops are not specified to do bounds checks, at least not
>>> obviously. Another example: OpVectorExtractDynamic, OpVectorInsertDynamic
>>> and related ops are not defined to bounds check, and explicitly specify
>>> that what is read or written is undefined if the index is out of bounds.
>>>
>>>
>>> Maybe I'm missing a simple reason why these ops are actually safe, or an
>>> easy way to add bounds checking. But I tentatively conclude that SPIR-V
>>> does *not* provide the required level of memory safety out of the box,
>>> not even in Logical Addressing Mode.
>>>
>>> Good catch for the OpImage* and OpVector*Dynamic. All of the ops
>> mentioned above are not safe because they access memory at a dynamic
>> offset. The capabilities they provide are critical for a GPU language: only
>> being able to access images and buffers at compile-time constant offsets
>> would be too restrictive. In that respect SPIR-V is no worse than any other
>> language we could choose, and has the nice property that in the logical
>> addressing mode, only a dozen ops might be unsafe.
>>
>
> The arguments to the image texel fetch operations would need to be clamped
> inside the shader, in the same way that buffer accesses might need to be,
> depending on the guarantees that the underlying APIs provide, and the
> decisions that are made about testability of WebGPU's memory safety. If for
> example WebGPU's semantics are defined as yielding either zero, or the
> value fetched at the clamped indices, for any out-of-range accesses against
> a buffer or image, and an underlying API like Direct3D 12 already provides
> those semantics, no additional bounds checks would need to be added to the
> compiled shader. If on the other hand the underlying API provides no
> guarantees, then the shader would need to be transformed during loading to
> fetch the size of the image using OpImageQuerySize and clamp the indices to
> that range.
>
> Note also that the typical way of sampling a texture is via
> OpImageSample*, which is guaranteed to be safe.
>
>
>
>> Therefore, as far as I can tell, my argument still stands. I believe we
>>> would need to significantly modify the SPIR-V format and SPIR-V
>>> implementations to achieve full memory safety. That's not to say it can't
>>> be done, but I don't think we should expect the resulting format to work
>>> with stock SPIR-V tools.
>>>
>>
> I disagree with this conclusion. It might be necessary to transform
> incoming shaders to insert clamping of some indices, or perhaps another
> approach would be needed. I disagree firmly with your conclusion that this
> would require significant modification of the SPIR-V format and toolchain
> to achieve. Apple contributed similar code to ANGLE's shader translator!
> See
> https://chromium.googlesource.com/angle/angle/+/master/src/third_party/compiler/ArrayBoundsClamper.cpp
> . This was a solid piece of work but didn't involve *any* modification of
> the GLES 1.00 spec which WebGL 1.0 shaders use essentially verbatim. It
> seems likely to me that the same properties would be achievable if SPIR-V
> were the shader format ingested by WebGPU implementations.
>
> Why don't we work on enumerating and documenting the security issues for
> shading languages/IRs that have been uncovered to date, like needing secure
> reads and writes for kernels' input and output buffers and texel fetches?
> Focusing on dismissing a potentially viable representation like SPIR-V at
> this point is like throwing the baby out with the bath water. That spec has
> had a ton of thought and experience put into it (as have DXIL and MSL) and
> aiming to dismiss it out of hand is counterproductive to the entire
> discussion.
>
> -Ken
>
>
>
>>> Regards,
>>> Maciej
>>>
>>>
>>> [1] I'm not absolutely sure that OpInBoundsAccessChain is bounds-checked
>>> because the definition is vague. It says: "Has the same semantics
>>> as OpAccessChain, with the addition that the resulting pointer is known to
>>> point within the base object." Does "known to" mean that the implementation
>>> bounds-checks and clamps the pointer to be within bounds (presumably with
>>> enough space left for the pointed-to type)? Or does it mean that the caller
>>> of this opcode promises that the pointer is within bounds? I'm assuming the
>>> former but it's not entirely clear.
>>>
>>>
>>> On Aug 15, 2017, at 2:41 PM, Maciej Stachowiak <mjs@apple.com> wrote:
>>>
>>>
>>>
>>> On Aug 15, 2017, at 2:29 PM, Kenneth Russell <kbr@google.com> wrote:
>>>
>>> On Tue, Aug 15, 2017 at 1:57 PM, Maciej Stachowiak <mjs@apple.com>
>>> wrote:
>>>
>>>>
>>>> Let's say we started with SPIR-V to make WebSPIR-V with at least the
>>>> following changes:
>>>>
>>>> - Some different APIs and interfaces are exposed than you would expect
>>>> in a WebGL or Vulkan shader.
>>>> - At compile-time we generate code with runtime bounds checks and other
>>>> safety features.
>>>> - We possibly subset SPIR-V to remove dangerous capabilities that can't
>>>> be effectively guarded with bounds checks or other runtime checks.
>>>>
>>>> If we did all these things, then it seems to me we wouldn't benefit
>>>> from the existing SPIR-V ecosystem:
>>>> - Existing SPIR-V shaders could not be reused with WebGPU
>>>> - Front ends that compile to SPIR-V could not be used unmodified
>>>> - Existing SPIR-V back ends could not correctly process the modified
>>>> language, since they wouldn't have the ability to do runtime checks.[1]
>>>>
>>>> Is my analysis correct? If so, then what is the benefit of basing on
>>>> SPIR-V?
>>>>
>>>
>>> This analysis is incorrect and oversimplifies the issues in order to
>>> cast SPIR-V in a negative light.
>>>
>>> The problematic areas are mainly in bounds-checking accesses in the
>>> input and output buffers to compute shaders. Pipeline stages like vertex,
>>> geometry, and fragment shaders are well and securely handled by the
>>> capabilities SPIR-V exposes, since it was designed as a shading language
>>> for graphics APIs.
>>>
>>>
>>> My point isn't to make SPIR-V look bad. I don't believe there's any
>>> shader language that meets all our requirements as-is.
>>>
>>> My intent is only to argue that it shouldn't have a leg up based on
>>> ecosystem considerations, because we'd likely have to modify it enough that
>>> the existing SPIR-V ecosystem won't be compatible with the end result.
>>> Despite your disagreement, I still think that's probably true.
>>>
>>> I'm not clear on how SPIR-V avoids having a bounds checking problem for
>>> shader types other than compute shaders. Are they unable to use the
>>> non-bounds-checked operations that compute shaders can use? Do they only
>>> get access to fixed-size arrays? Do they only get range-checked buffer
>>> operations? To be clear, I would consider anything that can be used to read
>>> or write arbitrary locations in video memory to be a security risk. Even if
>>> it's limited to the video memory of a specific process.
>>>
>>> If I'm missing something that makes this a non-issue, please explain.
>>>
>>> Instead of attempting to dismiss one particular solution a priori, what
>>> should be done is to enumerate the needed capabilities of the various kinds
>>> of shaders in WebGPU, and see what would need to be done to the starting
>>> point (Metal Shading Language, WebAssembly, DXIL, SPIR-V, a higher-level
>>> language, something brand new...) in order to have it be secure and
>>> performant. There are multiple interesting analyses, designs and
>>> comparisons to be done.
>>>
>>>
>>> I agree that this should be the approach. I did not mean to dismiss
>>> SPIR-V, merely to argue that it should not be considered to have any
>>> particular advantage over the other options. Except perhaps on intrinsic
>>> technical merit, which I think is the point of the overall analysis. It
>>> might be that a modified version of SPIR-V is sound technical choice
>>> whether or not it interoperates with regular SPIR-V.
>>>
>>> Regards,
>>> Maciej
>>>
>>>
>>>
>>

Received on Wednesday, 16 August 2017 19:18:24 UTC