Re: RFC: Re: shader IR vs. application IR from David Neto on 2017-08-16 (public-gpu@w3.org from August 2017)

From: David Neto <dneto@google.com>
Date: Wed, 16 Aug 2017 19:24:46 +0000
To: Kenneth Russell <kbr@google.com>, Corentin Wallez <cwallez@google.com>
Cc: Maciej Stachowiak <mjs@apple.com>, "Myles C. Maxfield" <mmaxfield@apple.com>, public-gpu@w3.org, Dean Jackson <dino@apple.com>
Message-ID: <CAPmVsJWz5L-YAgNkteqjgeDwQT=M_J7gbLFO5T79MXMLmYngpA@mail.gmail.com>
One more thing about Logical Addressing mode:

- By design, you can always statically trace a memory access back to the
original object allocation (an OpVariable) or to a function parameter.  For
example, that's why you can't get a "pointer" as the result of an OpPhi or
an OpLoad, etc.
- You may have to do interprocedural analysis to trace an access in a
callee.   To make static analysis easier you may have to do the moral
equivalent of inlining.
- Recursion is not allowed, and function pointers can't be expressed, so
inlining always bottoms out.

If it didn't come through before:  I *agree* with Maciej in that
*something* has to be specified at a higher level to ensure safety.  This
is true for any useful language.

david



On Wed, Aug 16, 2017 at 3:17 PM David Neto <dneto@google.com> wrote:

> I see some good progress on this thread.   I'll try to amplify and clarify
> certain points.
>
> You can think of SPIR-V as a *family* of shading languages with common
> fundamentals. Or even a *medium* for expressing shading languages.   Much
> like how most web pages are in UTF-8, you have to say more.
> There are at least 3 varieties in the wild: SPIR-V for Vulkan, for OpenGL,
> and OpenCL.  We can get weird and count separately every single variation
> where a particular capability or extension is enabled or not, but that's a
> bit obtuse.
>
> More seriously:
> There are lots of ways to generate undefined behaviour.  This is true of
> any minimally general language.  I appeal to the halting problem, again.
> Even division by zero is suspect, but I imagine we want to keep the
> division operator.
> It's up to the environment to say more about what happens in those cases,
> if it wants to.  We all agree that for the web we must provide robust
> safety guarantees.
>
> It it's kind of a paradox of specs, but a specification says *nothing*
> about what happens in the unspecified behaviour case.  That's what makes it
> *unspecified.*
>
> Substitute any other shading language XYZ for SPIR-V and the GPUWeb
> environment still has to say more about what happens in the face of
> ill-behaved programs.   In a later step we can make a technical comparison
> about the ease, robustness, and efficiency of applying those behavioural
> constraints.
>
> I agree with Ken that we should make our requirements explicit.  So far
> I've seen at least:
> - Access with program-determined indices into resources, in various stages
> of the graphics pipeline.  (E.g. general access to UBO, SSBO, images)
> - Robust behaviour in the face of ill-behaved programs:  E.g. in some
> cases behave as if out of bounds accesses are clamped;  in other cases
> maybe not, including predictable termination of the application(?) without
> further side effects or information leakage. (?)
> - General arithmetic.  I want my division instruction. :-)
>
>
> On a more detailed level:
>
> - OpVectorInsertDynamic and OpVectorExtractDynamic operate on values, not
> memory.  So they're an even simpler case than memory and image accesses.
>
> - OpInBoundsAccessChain vs. OpAccessChain.   Yes, the InBounds part is a
> hint.  More specifically the implementation does different things depending
> on whether it trusts the shader:
>     - In all cases program behaviour is undefined if the resulting pointer
> results in an out-of-object-bounds access
>     - If the implementation does not trust the shader, then the
> implementation treats both instructions the same way:  It always inserts
> bounds checks (runtime or by modifying the shader)
>     - If the implementation trusts the shader, *and* safety is desired,
> then the bounds checks can be elided in the case of OpInBoundAccessChain
>     - If the implementation trusts the shader and safety is not required
> at this level of the implementation, then you elide the checks in both
> cases.  E.g. do this if the implemention delegates safety guarantees to a
> more general level of protection, e.g. MMU hardware and its OS support
>
> - Effectively, there are no function pointers in SPIR-V, due to validation
> rules
>
>
>
> On Wed, Aug 16, 2017 at 2:19 PM Kenneth Russell <kbr@google.com> wrote:
>
>> On Wed, Aug 16, 2017 at 10:06 AM, Corentin Wallez <cwallez@google.com>
>> wrote:
>>
>>> On Wed, Aug 16, 2017 at 11:48 AM, Maciej Stachowiak <mjs@apple.com>
>>> wrote:
>>>
>>>>
>>>> I was prompted by off-list comments to read the SPIR-V spec and its
>>>> explanation of Logical Addressing Mode in the spec here: <
>>>> https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html>.
>>>>
>>>> Thanks for taking an in-depth look at the logical addressing mode.
>>>
>>>
>>>> It seems like SPIR-V's Logical Addressing Mode removes many of the
>>>> obvious ways to do out-of-bounds reads or writes. It stops you from storing
>>>> pointers and doing arithmetic on them; and it disables some opcodes that
>>>> let you read or write with a program-controlled size, or get pointers to
>>>> arbitrary locations. Many of these require the "Addresses" capability which
>>>> is disabled in Logical Addressing Mode.
>>>>
>>>> However, the OpAccessChain operation, which lets you get a pointer to
>>>> an arbitrary non-bounds-checled offset into a composite object, does
>>>> *not* require the Addresses capability. It does not require any
>>>> capability at all, which means there is no standard SPIR-V mechanism to
>>>> disable or limit it. This is despite the existence of
>>>> OpInBoundsAccessChain, which seems to do the same thing, but
>>>> bounds-checked.[1]
>>>>
>>>> My understanding is that OpInBoundsAccessChain is a hint from the
>>> application that no bounds check are necessary, but could be completely
>>> ignored.
>>>
>>>
>>>> There may be other operations that allow reading or writing memory
>>>> without bounds checking. For example, OpImageRead, OpImageWrite, and
>>>> related OpImage ops are not specified to do bounds checks, at least not
>>>> obviously. Another example: OpVectorExtractDynamic, OpVectorInsertDynamic
>>>> and related ops are not defined to bounds check, and explicitly specify
>>>> that what is read or written is undefined if the index is out of bounds.
>>>>
>>>>
>>>> Maybe I'm missing a simple reason why these ops are actually safe, or
>>>> an easy way to add bounds checking. But I tentatively conclude that SPIR-V
>>>> does *not* provide the required level of memory safety out of the box,
>>>> not even in Logical Addressing Mode.
>>>>
>>>> Good catch for the OpImage* and OpVector*Dynamic. All of the ops
>>> mentioned above are not safe because they access memory at a dynamic
>>> offset. The capabilities they provide are critical for a GPU language: only
>>> being able to access images and buffers at compile-time constant offsets
>>> would be too restrictive. In that respect SPIR-V is no worse than any other
>>> language we could choose, and has the nice property that in the logical
>>> addressing mode, only a dozen ops might be unsafe.
>>>
>>
>> The arguments to the image texel fetch operations would need to be
>> clamped inside the shader, in the same way that buffer accesses might need
>> to be, depending on the guarantees that the underlying APIs provide, and
>> the decisions that are made about testability of WebGPU's memory safety. If
>> for example WebGPU's semantics are defined as yielding either zero, or the
>> value fetched at the clamped indices, for any out-of-range accesses against
>> a buffer or image, and an underlying API like Direct3D 12 already provides
>> those semantics, no additional bounds checks would need to be added to the
>> compiled shader. If on the other hand the underlying API provides no
>> guarantees, then the shader would need to be transformed during loading to
>> fetch the size of the image using OpImageQuerySize and clamp the indices to
>> that range.
>>
>> Note also that the typical way of sampling a texture is via
>> OpImageSample*, which is guaranteed to be safe.
>>
>>
>>
>>> Therefore, as far as I can tell, my argument still stands. I believe we
>>>> would need to significantly modify the SPIR-V format and SPIR-V
>>>> implementations to achieve full memory safety. That's not to say it can't
>>>> be done, but I don't think we should expect the resulting format to work
>>>> with stock SPIR-V tools.
>>>>
>>>
>> I disagree with this conclusion. It might be necessary to transform
>> incoming shaders to insert clamping of some indices, or perhaps another
>> approach would be needed. I disagree firmly with your conclusion that this
>> would require significant modification of the SPIR-V format and toolchain
>> to achieve. Apple contributed similar code to ANGLE's shader translator!
>> See
>> https://chromium.googlesource.com/angle/angle/+/master/src/third_party/compiler/ArrayBoundsClamper.cpp
>> . This was a solid piece of work but didn't involve *any* modification of
>> the GLES 1.00 spec which WebGL 1.0 shaders use essentially verbatim. It
>> seems likely to me that the same properties would be achievable if SPIR-V
>> were the shader format ingested by WebGPU implementations.
>>
>> Why don't we work on enumerating and documenting the security issues for
>> shading languages/IRs that have been uncovered to date, like needing secure
>> reads and writes for kernels' input and output buffers and texel fetches?
>> Focusing on dismissing a potentially viable representation like SPIR-V at
>> this point is like throwing the baby out with the bath water. That spec has
>> had a ton of thought and experience put into it (as have DXIL and MSL) and
>> aiming to dismiss it out of hand is counterproductive to the entire
>> discussion.
>>
>> -Ken
>>
>>
>>
>>>> Regards,
>>>> Maciej
>>>>
>>>>
>>>> [1] I'm not absolutely sure that OpInBoundsAccessChain is
>>>> bounds-checked because the definition is vague. It says: "Has the same
>>>> semantics as OpAccessChain, with the addition that the resulting pointer is
>>>> known to point within the base object." Does "known to" mean that the
>>>> implementation bounds-checks and clamps the pointer to be within bounds
>>>> (presumably with enough space left for the pointed-to type)? Or does it
>>>> mean that the caller of this opcode promises that the pointer is within
>>>> bounds? I'm assuming the former but it's not entirely clear.
>>>>
>>>>
>>>> On Aug 15, 2017, at 2:41 PM, Maciej Stachowiak <mjs@apple.com> wrote:
>>>>
>>>>
>>>>
>>>> On Aug 15, 2017, at 2:29 PM, Kenneth Russell <kbr@google.com> wrote:
>>>>
>>>> On Tue, Aug 15, 2017 at 1:57 PM, Maciej Stachowiak <mjs@apple.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Let's say we started with SPIR-V to make WebSPIR-V with at least the
>>>>> following changes:
>>>>>
>>>>> - Some different APIs and interfaces are exposed than you would expect
>>>>> in a WebGL or Vulkan shader.
>>>>> - At compile-time we generate code with runtime bounds checks and
>>>>> other safety features.
>>>>> - We possibly subset SPIR-V to remove dangerous capabilities that
>>>>> can't be effectively guarded with bounds checks or other runtime checks.
>>>>>
>>>>> If we did all these things, then it seems to me we wouldn't benefit
>>>>> from the existing SPIR-V ecosystem:
>>>>> - Existing SPIR-V shaders could not be reused with WebGPU
>>>>> - Front ends that compile to SPIR-V could not be used unmodified
>>>>> - Existing SPIR-V back ends could not correctly process the modified
>>>>> language, since they wouldn't have the ability to do runtime checks.[1]
>>>>>
>>>>> Is my analysis correct? If so, then what is the benefit of basing on
>>>>> SPIR-V?
>>>>>
>>>>
>>>> This analysis is incorrect and oversimplifies the issues in order to
>>>> cast SPIR-V in a negative light.
>>>>
>>>> The problematic areas are mainly in bounds-checking accesses in the
>>>> input and output buffers to compute shaders. Pipeline stages like vertex,
>>>> geometry, and fragment shaders are well and securely handled by the
>>>> capabilities SPIR-V exposes, since it was designed as a shading language
>>>> for graphics APIs.
>>>>
>>>>
>>>> My point isn't to make SPIR-V look bad. I don't believe there's any
>>>> shader language that meets all our requirements as-is.
>>>>
>>>> My intent is only to argue that it shouldn't have a leg up based on
>>>> ecosystem considerations, because we'd likely have to modify it enough that
>>>> the existing SPIR-V ecosystem won't be compatible with the end result.
>>>> Despite your disagreement, I still think that's probably true.
>>>>
>>>> I'm not clear on how SPIR-V avoids having a bounds checking problem for
>>>> shader types other than compute shaders. Are they unable to use the
>>>> non-bounds-checked operations that compute shaders can use? Do they only
>>>> get access to fixed-size arrays? Do they only get range-checked buffer
>>>> operations? To be clear, I would consider anything that can be used to read
>>>> or write arbitrary locations in video memory to be a security risk. Even if
>>>> it's limited to the video memory of a specific process.
>>>>
>>>> If I'm missing something that makes this a non-issue, please explain.
>>>>
>>>> Instead of attempting to dismiss one particular solution a priori, what
>>>> should be done is to enumerate the needed capabilities of the various kinds
>>>> of shaders in WebGPU, and see what would need to be done to the starting
>>>> point (Metal Shading Language, WebAssembly, DXIL, SPIR-V, a higher-level
>>>> language, something brand new...) in order to have it be secure and
>>>> performant. There are multiple interesting analyses, designs and
>>>> comparisons to be done.
>>>>
>>>>
>>>> I agree that this should be the approach. I did not mean to dismiss
>>>> SPIR-V, merely to argue that it should not be considered to have any
>>>> particular advantage over the other options. Except perhaps on intrinsic
>>>> technical merit, which I think is the point of the overall analysis. It
>>>> might be that a modified version of SPIR-V is sound technical choice
>>>> whether or not it interoperates with regular SPIR-V.
>>>>
>>>> Regards,
>>>> Maciej
>>>>
>>>>
>>>>
>>>
Received on Wednesday, 16 August 2017 19:25:24 UTC