Re: Binary vs Text from Maciej Stachowiak on 2018-11-07 (public-gpu@w3.org from November 2018)

From: Maciej Stachowiak <mjs@apple.com>
Date: Tue, 06 Nov 2018 21:57:41 -0800
To: Ken Russell <kbr@google.com>
Cc: "Myles C. Maxfield" <mmaxfield@apple.com>, public-gpu <public-gpu@w3.org>
Message-id: <DD19307A-BDB9-476B-BB6A-699CB5200904@apple.com>
> On Nov 6, 2018, at 3:15 PM, Ken Russell <kbr@google.com> wrote:
> 
> Hi Myles,
> 
> Our viewpoint is based on the experience of using GLSL as WebGL's input language, and dealing with hundreds of bugs associated with parsing, validating, and passing a textual shading language through to underlying drivers.
> 
> Kai wrote this up at the beginning of the year in this Github issue: https://github.com/gpuweb/gpuweb/issues/44 <https://github.com/gpuweb/gpuweb/issues/44> , and there is a detailed bug list (which is still only a sampling of the associated bugs we fixed over the years) in this spreadsheet:
> https://docs.google.com/spreadsheets/d/1bjfZJcvGPI4M6Df5HC8BPQXbl847RpfsFKw6SI6_T30/edit#gid=0 <https://docs.google.com/spreadsheets/d/1bjfZJcvGPI4M6Df5HC8BPQXbl847RpfsFKw6SI6_T30/edit#gid=0>
> 
> Unlike what I said on the call, the main issues aren't really around the parsing of the input language or string handling. Both the preprocessor's and compiler's parsers in ANGLE's shader translator are autogenerated from grammars. Of more concern were situations where we had to semi-arbitrarily restrict the source language so that we wouldn't pass shaders through to the graphics driver which would crash its own shader compiler. Examples included having to restrict the "complexity" or "depth" of expression trees <https://chromium.googlesource.com/angle/angle/+/eb1a010f0f996b3742fd34b92ffaf9014c943528> to avoid stack overflows in some drivers (this was added as an implementation-specific security workaround rather than to the spec), working around bugs in variable scoping and shadowing <https://chromium.googlesource.com/angle/angle/+/855d964bd0d05f6b2cb303f625506cf53d37e94f>, defeating incorrect compiler optimizations, and more. Please take the time to read Kai's writeup and go through the spreadsheet.

It seems to me these issues aren’t about the language that is input to the WebGPU (or WebGL) implementation at all, but rather what it outputs and what drivers ingest.

However, what languages we output (and what drivers then ingest) will be the same regardless of the source language. What the CG has been discussing is these two system designs:

(A) [ WHLSL source ] —> [ WebGPU implementation ] —> [  One of: Vulkan SPIR-V, MSL, HLSL, or DXIL ] —> [ Driver ]
(B) [ WebGPU SPIR-V source ] —> [ WebGPU implementation ] —> [ One of: Vulkan SPIR-V, MSL, HLSL, or DXIL ] —> [ Driver ]


Regardless of the input language to the WebGPU implementation, the driver will be presented with the same possible range of languages, which include both text-based and binary languages. If the problem is not with parsing processing at the time of ingesting the source language, but rather the results of sending a text-based language to a driver, then this information has no impact on what we choose as a source language. Conflating the input to WebGPU with the input to the driver is an unsound argument.

I feel like others on this thread have made the same mix-up, so let’s all try  to be clear about the difference between the input language, and what is fed to the driver.


Note also that I distinguish “WebGPU SPIR-V” from “Vulkan SPIR-V”. We’ve been speaking of SPIR-V as if it’s a single language, but really it’s a language family with a bunch of parameters and variations that can be defined by an execution environment. The SPIR-V accepted by Vulkan is not the same as the set accepted by OpenCL, for example. Similarly, the initial informal document describing how SPIR-V could be handled safely on the web includes both validation steps (which impose restrictions by rejecting some shaders) and transforms (which translate some constructs into others, e.g. adding bounds checks in certain places). Hopefully this is already clear to everyone, but I wanted to emphasize the point.  It’s not clear to me at this time how this would affect already existing Vulkan SPIR-V drivers since the WebGPU dialect of SPIR-V is not yet fully defined.



> 
> The question will come up: would using a lower-level representation like SPIR-V for WebGPU's shaders really address these problems? I think it would. SPIR-V uses  SSA form and simple numbers for variables, which will eliminate entire classes of bugs in mishandling of language-level identifiers, variables, and scopes.

This seems like a very speculative hypothesis, and it does not match my intuition, when thinking about the system as a whole.

These issues won’t affect implementations using Vulkan under the covers in any case, since they will be fed Vulkan SPIR-V and will not know about scopes in the source language, whatever it is. But they will affect a Metal-based implementation regardless, since WebGPU SPIR-V would need to be transformed to text-based MSL, in the process introducing identifiers, variables and scopes. It might be that the MSL looks like SSA form, which simplifies some issues. But a text-based input format like WHLSL can also be transformed into SSA-looking MSL, if that was for some reason more robust or more convenient. 

Now, in practice, Metal is unlikely to have the same kinds of issues as OpenGL drivers, since it decouples parsing from the individual driver. But if it did, then the result of compiling WHLSL to MSL can use all the same techniques as compiling WHLSL to SPIR-V, and then compiling that to MSL.




> SPIR-V's primitives are lower level than those in a textual shader language,

SSA form can express basically all the same things that an AST can, so the lower levelness doesn’t really restrict what you can throw at, say, a Metal driver.

> and if it turns out restrictions on shaders are still needed in WebGPU's environment spec in order to work around driver bugs, they'll be easier to define more precisely against SPIR-V than source text.

I’m not sure this is true. This would be affected more by the precision of the spec than the choice of wire format. And WHLSL’s draft specification is already much more precise than the SPIR-V spec, closer to the level WebAssembly is at.


> Using SPIR-V as WebGPU's shader ingestion format would bring other advantages, including that it's based on years of experience developing a portable binary shader representation, and has been designed in conjunction with GPU vendors across the industry.

But it will also have significant disadvantages, like requiring the serving of large piles of JavaScript to do online compilation for those use cases that require it.


> On the conference call I didn't mean to over-generalize the topic to "binary formats vs. text formats in the browser", so apologies if I misspoke.

It’s good that you agree the problems of GLSL don’t generalize to all text formats. For the record though, I would like to cite something I wrote offline, in case anyone does feel this way:

-----------
I notice you present GSLS as the only example where there was a series of problems like this. And you did not address Myles’s point that browsers ingest many text and binary formats, and at least in our experience, we do not find the binary formats to be more robust or more secure. There are three possible conclusions to draw from the experience with GLSL:

(1) GLSL specifically had unique problems with robustness and security (due to driver design, insufficiently precise specification, etc).
(2) All text-based shader languages for the web would have the same problem as GLSL, but binary shader languages for the web would not.
(3) All text-based languages of any kind for the web would have the same problem as GLSL, but binary formats for the web would not.

We have evidence that statement (3) is false: looking at the whole range of languages and formats for the web, binary formats are not historically more robust or more secure on the whole. Some  binary formats that used to be pervasive on the web, such as Java and SWF, have been abandoned precisely because their implementations are not robust. PDF requires sandboxing just as much as HTML. JavaScript and WebAssembly have comparable interoperability and security. Comparing heads-up examples from the same categories, there is no evidence that binary formats are more robust across the board.

We have some evidence that at least statement (1) is true, based o your bug list above.

So the remaining potentialpoint of dispute is statement (2). Based on GLSL, you extrapolate to all text-based shader languages. Based on other text and binary languages for the web, we extrapolate to text and binary shader languages. Why do you think (1) has more bearing on a new shader language than (3)? It seems to me that the difference between GLSL and more successful text based formats for the web is not due to being a shader language. Rather, it’s due to factors specific to GLSL, such as insufficiently precise specification, and use of a preprocessor. I don’t see why you would extrapolate to languages where these factors don’t apply.
-----------

It seems like, in the latest discussion, the cited problems aren’t with processing of a text language as input, but rather in feeding a text-based language to drivers that all have their own (potentially inconsistent and quirky) implementations. I’m glad! This means we don’t need to debate text vs binary in the abstract.

But again, our choice of source language has absolutely no effect on whether we face these problems; we still have to work with the same drivers.



> 
> -Ken
> 
> 
> 
> On Mon, Nov 5, 2018 at 10:58 PM Myles C. Maxfield <mmaxfield@apple.com <mailto:mmaxfield@apple.com>> wrote:
> Hi!
> 
> When we were discussing WebGPU today, the issue of binary vs text was raised. We are confused at the viewpoint that binary languages on the Web are inherently safer and more portable than text ones. All of our browsers accept HTML, CSS, JavaScript, binary image formats, binary font files, GLSL, and WebAssembly, and so we don’t understand how our teams came to opposite conclusions given similar circumstances.
> 
> Can you describe the reasons for this viewpoint (as specifically as possible, preferably)? We’d like to better understand the reasoning.
> 
> Thanks,
> Myles
Received on Wednesday, 7 November 2018 05:58:12 UTC