Re: RFC: Re: shader IR vs. application IR

> On Aug 16, 2017, at 12:17 PM, David Neto <dneto@google.com> wrote:
> 
> I see some good progress on this thread.   I'll try to amplify and clarify certain points.
> 
> You can think of SPIR-V as a family of shading languages with common fundamentals. Or even a medium for expressing shading languages.   Much like how most web pages are in UTF-8, you have to say more.
> There are at least 3 varieties in the wild: SPIR-V for Vulkan, for OpenGL, and OpenCL.  We can get weird and count separately every single variation where a particular capability or extension is enabled or not, but that's a bit obtuse.
> 
> More seriously:
> There are lots of ways to generate undefined behaviour.  This is true of any minimally general language.  I appeal to the halting problem, again.  Even division by zero is suspect, but I imagine we want to keep the division operator.

There exist languages (both source and IR level) where there is no undefined behavior. I posit that the shader language for WebGPU should meet this standard.

Division by zero in particular has very well defined behavior in most popular programming languages (popular answers are terminate, exception or +/- Inf). But apparently not in SPIR-V from my read of the spec, which maybe is itself a problem?

Now that I think about it, even OpUndef is a risk, since it needs to be guaranteed that the value it produces is not ever the result of reading uninitialized memory.

> It's up to the environment to say more about what happens in those cases, if it wants to.  We all agree that for the web we must provide robust safety guarantees.
> 
> It it's kind of a paradox of specs, but a specification says *nothing* about what happens in the unspecified behaviour case.  That's what makes it unspecified. 

I think there's a difference between undefined and unspecified. In some cases, the spec defines an operation fully but says some cases produce an undefined value. For example, OpDiv says the return value is undefined in one case, but presumably it's not allowed to do random other things, like terminate the program or scribble memory. In other cases, the spec doesn't say what happens at all, as in the case of out-of-bounds OpAccessChain. In still other cases, the implementation degrees of freedom are unclear. OpVectorInsertDynamic 

> 
> Substitute any other shading language XYZ for SPIR-V and the GPUWeb environment still has to say more about what happens in the face of ill-behaved programs.   In a later step we can make a technical comparison about the ease, robustness, and efficiency of applying those behavioural constraints.
> 
> I agree with Ken that we should make our requirements explicit.  So far I've seen at least:
> - Access with program-determined indices into resources, in various stages of the graphics pipeline.  (E.g. general access to UBO, SSBO, images)
> - Robust behaviour in the face of ill-behaved programs:  E.g. in some cases behave as if out of bounds accesses are clamped;  in other cases maybe not, including predictable termination of the application(?) without further side effects or information leakage. (?)
> - General arithmetic.  I want my division instruction. :-)
> 
> 
> On a more detailed level:
> 
> - OpVectorInsertDynamic and OpVectorExtractDynamic operate on values, not memory.  So they're an even simpler case than memory and image accesses.

I'm not sure why this makes things simpler. They still need bounds checks.

> 
> - OpInBoundsAccessChain vs. OpAccessChain.   Yes, the InBounds part is a hint.  More specifically the implementation does different things depending on whether it trusts the shader:
>     - In all cases program behaviour is undefined if the resulting pointer results in an out-of-object-bounds access
>     - If the implementation does not trust the shader, then the implementation treats both instructions the same way:  It always inserts bounds checks (runtime or by modifying the shader)
>     - If the implementation trusts the shader, *and* safety is desired, then the bounds checks can be elided in the case of OpInBoundAccessChain
>     - If the implementation trusts the shader and safety is not required at this level of the implementation, then you elide the checks in both cases.  E.g. do this if the implemention delegates safety guarantees to a more general level of protection, e.g. MMU hardware and its OS support
> 
> - Effectively, there are no function pointers in SPIR-V, due to validation rules

I agree with you on the high level it's likely a solvable problem.

Regards,
Maciej

> 
> 
> 
> On Wed, Aug 16, 2017 at 2:19 PM Kenneth Russell <kbr@google.com <mailto:kbr@google.com>> wrote:
> On Wed, Aug 16, 2017 at 10:06 AM, Corentin Wallez <cwallez@google.com <mailto:cwallez@google.com>> wrote:
> On Wed, Aug 16, 2017 at 11:48 AM, Maciej Stachowiak <mjs@apple.com <mailto:mjs@apple.com>> wrote:
> 
> I was prompted by off-list comments to read the SPIR-V spec and its explanation of Logical Addressing Mode in the spec here: <https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html <https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html>>.
> 
> Thanks for taking an in-depth look at the logical addressing mode.
>  
> It seems like SPIR-V's Logical Addressing Mode removes many of the obvious ways to do out-of-bounds reads or writes. It stops you from storing pointers and doing arithmetic on them; and it disables some opcodes that let you read or write with a program-controlled size, or get pointers to arbitrary locations. Many of these require the "Addresses" capability which is disabled in Logical Addressing Mode.
> 
> However, the OpAccessChain operation, which lets you get a pointer to an arbitrary non-bounds-checled offset into a composite object, does not require the Addresses capability. It does not require any capability at all, which means there is no standard SPIR-V mechanism to disable or limit it. This is despite the existence of OpInBoundsAccessChain, which seems to do the same thing, but bounds-checked.[1]
> 
> My understanding is that OpInBoundsAccessChain is a hint from the application that no bounds check are necessary, but could be completely ignored.
>  
> There may be other operations that allow reading or writing memory without bounds checking. For example, OpImageRead, OpImageWrite, and related OpImage ops are not specified to do bounds checks, at least not obviously. Another example: OpVectorExtractDynamic, OpVectorInsertDynamic and related ops are not defined to bounds check, and explicitly specify that what is read or written is undefined if the index is out of bounds.
> 
> 
> Maybe I'm missing a simple reason why these ops are actually safe, or an easy way to add bounds checking. But I tentatively conclude that SPIR-V does not provide the required level of memory safety out of the box, not even in Logical Addressing Mode. 
> 
> Good catch for the OpImage* and OpVector*Dynamic. All of the ops mentioned above are not safe because they access memory at a dynamic offset. The capabilities they provide are critical for a GPU language: only being able to access images and buffers at compile-time constant offsets would be too restrictive. In that respect SPIR-V is no worse than any other language we could choose, and has the nice property that in the logical addressing mode, only a dozen ops might be unsafe.
> 
> The arguments to the image texel fetch operations would need to be clamped inside the shader, in the same way that buffer accesses might need to be, depending on the guarantees that the underlying APIs provide, and the decisions that are made about testability of WebGPU's memory safety. If for example WebGPU's semantics are defined as yielding either zero, or the value fetched at the clamped indices, for any out-of-range accesses against a buffer or image, and an underlying API like Direct3D 12 already provides those semantics, no additional bounds checks would need to be added to the compiled shader. If on the other hand the underlying API provides no guarantees, then the shader would need to be transformed during loading to fetch the size of the image using OpImageQuerySize and clamp the indices to that range.
> 
> Note also that the typical way of sampling a texture is via OpImageSample*, which is guaranteed to be safe.
> 
>  
> Therefore, as far as I can tell, my argument still stands. I believe we would need to significantly modify the SPIR-V format and SPIR-V implementations to achieve full memory safety. That's not to say it can't be done, but I don't think we should expect the resulting format to work with stock SPIR-V tools.
> 
> I disagree with this conclusion. It might be necessary to transform incoming shaders to insert clamping of some indices, or perhaps another approach would be needed. I disagree firmly with your conclusion that this would require significant modification of the SPIR-V format and toolchain to achieve. Apple contributed similar code to ANGLE's shader translator! See https://chromium.googlesource.com/angle/angle/+/master/src/third_party/compiler/ArrayBoundsClamper.cpp <https://chromium.googlesource.com/angle/angle/+/master/src/third_party/compiler/ArrayBoundsClamper.cpp> . This was a solid piece of work but didn't involve *any* modification of the GLES 1.00 spec which WebGL 1.0 shaders use essentially verbatim. It seems likely to me that the same properties would be achievable if SPIR-V were the shader format ingested by WebGPU implementations.
> 
> Why don't we work on enumerating and documenting the security issues for shading languages/IRs that have been uncovered to date, like needing secure reads and writes for kernels' input and output buffers and texel fetches? Focusing on dismissing a potentially viable representation like SPIR-V at this point is like throwing the baby out with the bath water. That spec has had a ton of thought and experience put into it (as have DXIL and MSL) and aiming to dismiss it out of hand is counterproductive to the entire discussion.
> 
> -Ken
> 
> 
> 
> Regards,
> Maciej
> 
> 
> [1] I'm not absolutely sure that OpInBoundsAccessChain is bounds-checked because the definition is vague. It says: "Has the same semantics as OpAccessChain, with the addition that the resulting pointer is known to point within the base object." Does "known to" mean that the implementation bounds-checks and clamps the pointer to be within bounds (presumably with enough space left for the pointed-to type)? Or does it mean that the caller of this opcode promises that the pointer is within bounds? I'm assuming the former but it's not entirely clear.
> 
> 
>> On Aug 15, 2017, at 2:41 PM, Maciej Stachowiak <mjs@apple.com <mailto:mjs@apple.com>> wrote:
>> 
>> 
>> 
>>> On Aug 15, 2017, at 2:29 PM, Kenneth Russell <kbr@google.com <mailto:kbr@google.com>> wrote:
>>> 
>>> On Tue, Aug 15, 2017 at 1:57 PM, Maciej Stachowiak <mjs@apple.com <mailto:mjs@apple.com>> wrote:
>>> 
>>> Let's say we started with SPIR-V to make WebSPIR-V with at least the following changes:
>>> 
>>> - Some different APIs and interfaces are exposed than you would expect in a WebGL or Vulkan shader.
>>> - At compile-time we generate code with runtime bounds checks and other safety features.
>>> - We possibly subset SPIR-V to remove dangerous capabilities that can't be effectively guarded with bounds checks or other runtime checks.
>>> 
>>> If we did all these things, then it seems to me we wouldn't benefit from the existing SPIR-V ecosystem:
>>> - Existing SPIR-V shaders could not be reused with WebGPU
>>> - Front ends that compile to SPIR-V could not be used unmodified
>>> - Existing SPIR-V back ends could not correctly process the modified language, since they wouldn't have the ability to do runtime checks.[1]
>>> 
>>> Is my analysis correct? If so, then what is the benefit of basing on SPIR-V?
>>> 
>>> This analysis is incorrect and oversimplifies the issues in order to cast SPIR-V in a negative light.
>>> 
>>> The problematic areas are mainly in bounds-checking accesses in the input and output buffers to compute shaders. Pipeline stages like vertex, geometry, and fragment shaders are well and securely handled by the capabilities SPIR-V exposes, since it was designed as a shading language for graphics APIs.
>> 
>> My point isn't to make SPIR-V look bad. I don't believe there's any shader language that meets all our requirements as-is.
>> 
>> My intent is only to argue that it shouldn't have a leg up based on ecosystem considerations, because we'd likely have to modify it enough that the existing SPIR-V ecosystem won't be compatible with the end result. Despite your disagreement, I still think that's probably true.
>> 
>> I'm not clear on how SPIR-V avoids having a bounds checking problem for shader types other than compute shaders. Are they unable to use the non-bounds-checked operations that compute shaders can use? Do they only get access to fixed-size arrays? Do they only get range-checked buffer operations? To be clear, I would consider anything that can be used to read or write arbitrary locations in video memory to be a security risk. Even if it's limited to the video memory of a specific process.
>> 
>> If I'm missing something that makes this a non-issue, please explain.
>> 
>>> Instead of attempting to dismiss one particular solution a priori, what should be done is to enumerate the needed capabilities of the various kinds of shaders in WebGPU, and see what would need to be done to the starting point (Metal Shading Language, WebAssembly, DXIL, SPIR-V, a higher-level language, something brand new...) in order to have it be secure and performant. There are multiple interesting analyses, designs and comparisons to be done.
>> 
>> I agree that this should be the approach. I did not mean to dismiss SPIR-V, merely to argue that it should not be considered to have any particular advantage over the other options. Except perhaps on intrinsic technical merit, which I think is the point of the overall analysis. It might be that a modified version of SPIR-V is sound technical choice whether or not it interoperates with regular SPIR-V. 
>> 
>> Regards,
>> Maciej
> 
> 

Received on Wednesday, 16 August 2017 20:07:57 UTC