W3C home > Mailing lists > Public > public-gpu@w3.org > November 2018

Re: Binary vs Text

From: David Neto <dneto@google.com>
Date: Mon, 12 Nov 2018 19:18:26 -0500
Message-ID: <CAPmVsJVNNS8hu=oVuM=JmHA982M_=E41wi07v4agPxHxCPxeDA@mail.gmail.com>
To: Maciej Stachowiak <mjs@apple.com>
Cc: James Darpinian <jdarpinian@google.com>, Filip Pizlo <fpizlo@apple.com>, Kai Ninomiya <kainino@google.com>, "Myles C. Maxfield" <mmaxfield@apple.com>, Jeff Gilbert <jgilbert@mozilla.com>, Kenneth Russell <kbr@google.com>, public-gpu <public-gpu@w3.org>
Hi.  I'm very late to this thread (sorry).  I have not caught up on it

At the moment WHLSL is unimplementable on the GPUs I know, due to
oversimplification in WHLSL's memory model.  I have filed 7 issues against
the WHLSL spec. (#247 through #253).  Some are more fundamental than others.


On Mon, Nov 12, 2018 at 6:54 PM Maciej Stachowiak <mjs@apple.com> wrote:

> On Nov 12, 2018, at 2:41 PM, James Darpinian <jdarpinian@google.com>
> wrote:
> > Too much complexity is bad for both humans writing the language and
> software consuming it.
> You are conflating two different kinds of complexity. Features that make
> reading or writing the language less complex for humans may make the
> implementation more complex and vice versa.
> Sure, this may sometimes be true. Other times these two notions of
> complexity are aligned.
> Unfortunately the evidence that you request can only really be gathered by
> implementation experience. If we continue down the path of implementing
> both SPIR-V and WHLSL ingestion, I doubt that will help us come to
> agreement. The evidence we gather is still subject to interpretation which
> we will likely still disagree on, and the more we build the more we will
> have to throw away, which we will naturally be reluctant to do.
> I share your concern. I’m not sure how we get out of this impasse. One
> thing we could  try is to lay out up front what sort of evidence, if
> presented, would lead each side to change their position.
> However, we have a lot of evidence already available to us from previous
> implementers of modern graphics APIs, who have universally chosen to
> provide ingestion formats that are different from their shading languages.
> To the extent that this is true, I don’t place a lot of weight on it. The
> web is a different environment, and often the right choice for the web is
> different. If you choose web formats as the reference class rather than
> shader formats for modern graphics APIs, the evidence is strongly in favor
> of text based formats. I think “web-based languages" is a better choice of
> reference, because almost every modern technology has required significant
> rethinking and adaptation for the web, and many of the lessons learned are
> universal.
> It’s also worth noting that DX12 and Metal both have not chosen to make an
> exclusive ingestion format that’s different from the shading language. It’s
> possible to use the actual shading language at runtime. In the case of
> Metal, the binary format is not even a compile target for third parties;
> the only official input point is Metal.   It’s just that apps are allowed
> to bundle a precompiled binary shader. Failing to directly handle a
> human-authorable format at all would be the more unusual choice.
> Regards,
> Maciej
> On Thu, Nov 8, 2018 at 11:59 PM Maciej Stachowiak <mjs@apple.com> wrote:
>> On Nov 8, 2018, at 2:51 PM, James Darpinian <jdarpinian@google.com>
>> wrote:
>> > Specifically, I don’t agree that the ingestion format can or should be
>> “non-evolving”
>> Let's put that question aside for now. I'd like to find some things we
>> can all agree on.
>> It’s good to find things we can agree on. It’s also important to be clear
>> about what we don’t yet agree on. I’ll try   to do both.
>> Can we agree that the ingestion format and the shading language have
>> different requirements that sometimes conflict,
>> Depends on what you mean by “sometimes". I think I was pretty explicit
>> about my position, but to state it again:
>> - I agree that it’s possible in theory that we could find a such a
>> conflict.
>> - I don’t agree that we have already found one.
>> - I agree that if we find a conflict, this may push us to use different
>> languages for these things, if the best available compromise between the
>> requirements is more harmful on net than the harm of having two separate
>> languages.
>> - I note that even for a single purpose of a language, there may be
>> conflicting requirements that call for tradeoffs to be made.
>> and in particular HLSL compatibility vs. simplicity is one of those
>> conflicts?
>> I don’t fully agree with this. To elaborate:
>> * A good level of simplicity is a goal for both a human-writable shader
>> language and an ingestion format. There’s a minimum level of complexity is
>> set by the requirements of the domain (i.e. a shader language/format has to
>> have the expressiveness and capabilities needed for shaders). Too much
>> complexity is bad for both humans writing the language and software
>> consuming it.
>> * Perfect HLSL compatibility is likely not achievable for a
>> human-writable shader language for the web, because regular HLSL doesn’t
>> have the right safety properties. The question is how far to go in that
>> direction. Being at least superficially similar is helpful for shader
>> authors. Being real-world compatible with at least some HLSL shaders is
>> even nicer, if it’s practical.
>> * There is indeed some tradeoff between more HLSL compatibility and more
>> complexity. But more complexity is a downside for humans too. So this
>> tradeoff exists before you even consider the needs of software consuming
>> the language. I suspect the best range in this tradeoff space is also a
>> good spot for software ingestion needs. But I could be convinced otherwise
>> by evidence.
>> I guess there are some factual questions that could shed light on the
>> matter:
>> - Does WHLSL have good enough HLSL compatibility to allow any useful
>> shaders at all to be brought over, or only  enough for vague familiarity?
>> - Can more compatibility be added without:
>> - Violating web safety requirements?
>> - Adding a level of complexity that’s bad for authors?
>> - Making the language too hard to process safely and robustly?
>> - If more compatibility is added, will that actually allow more real
>> existing shaders to run, or would it just add a bit more familiarity?
>> I don’t know enough about HLSL or the world of HLSL shaders out there to
>> answer these questions myself.
>> Regards,
>> Maciej
>> On Thu, Nov 8, 2018 at 1:52 PM Maciej Stachowiak <mjs@apple.com> wrote:
>>> On Nov 8, 2018, at 1:09 PM, James Darpinian <jdarpinian@google.com>
>>> wrote:
>>> > > Would you be interested in a non-evolving AST-level ingestion format?
>>> > Yes, if that format is text on the wire, since that is the most
>>> efficient and simple way to express an AST format.
>>> Perhaps there's something we can agree on here then. Can we agree that
>>> the ingestion format and the shading language have different requirements
>>> that sometimes conflict, e.g.  compatibility with existing HLSL vs.
>>> simplicity,
>>> I agree that it *may* be true, but not that it has been shown to be
>>> true on this thread so far. Specifically, I don’t agree that the ingestion
>>> format can or should be “non-evolving”. It should probably evolve more
>>> slowly than other web languages, and likely will regardless, due to the
>>> nature of the domain. But that’s about it.
>>> and we should, as a group, investigate making the ingestion format
>>> different from the shading language to better satisfy both sets of
>>> requirements?
>>> I think we are already investigating it in that we’re considering a web
>>> dialect of SPIR-V as one of the ingestion formats, and no one thinks it’s a
>>> human-writable shader language.
>>> Whether we ultimately decide that the ingestion format is different from
>>> the human-writable format remains to be seen. In my mind, it depends on if
>>> we find that they actually have conflicting requirements, and that the
>>> compromises necessary to satisfy both are a higher cost than having two
>>> formats.
>>> I tend to think a single text-based language can both be an adequate
>>> compiler target for other languages, still nice to write directly, and
>>> secure and robust enough to use as a wire format on the web, so I’m not yet
>>> convinced we need two formats.
>>> Regards,
>>> Maciej
>>> On Thu, Nov 8, 2018 at 8:45 AM Filip Pizlo <fpizlo@apple.com> wrote:
>>>> On Nov 7, 2018, at 10:57 PM, Kai Ninomiya <kainino@google.com> wrote:
>>>> Maciej: You're right that comparing WHLSL with JavaScript is not a fair
>>>> analogy. I mistook your statement "The evidence from WebAssembly vs
>>>> JavaScript suggests this probably won’t be true"  to be trying to make
>>>> that analogy, but I see now that it was about a more specific point. I
>>>> apologize for digging at this rathole.
>>>> Filip: WebAssembly is a little hard to compare with SPIR-V since it's
>>>> not SSA as you pointed out. WHLSL may be comparable to WebAssembly in that
>>>> it is, in essence, an AST-level language. However, WHLSL is most definitely
>>>> not at the level of WebAssembly when it comes to actual language
>>>> complexity, if we are going to support existing HLSL code,
>>>> I’m not sure that is true. Like WebAssembly, WHLSL just contains the
>>>> low level features you need to build other things out of.
>>>> The only manner in which WHLSL feels more complex to me is the addition
>>>> of:
>>>> - GPU style concurrency, which has more quirks than CPU style.
>>>> - API for doing graphics things. WebAssembly is only concerned with the
>>>> language and it has basically no api exposed to the wasm program. WHLSL has
>>>> lots of spec-mandated functions exposed to the WHLSL program.
>>>> So, I don’t think that WHLSL is more complex except where it absolutely
>>>> has to be to do graphics. SPIR-V also has these additional complexities.
>>>> and especially if we are going to add additional features (e.g.
>>>> templates/generics or operator overloading) to the language.
>>>> We aren’t proposing to add templates to WHLSL at this time. I think
>>>> that when debating about WHLSL versus other languages, we should focus on
>>>> what is being proposed rather than what might be proposed. I’m not a fan of
>>>> critiquing something that might be proposed but hasn’t been proposed, since
>>>> such a critique has no limiting principle - you could make up whatever you
>>>> think WHLSL might have and point out that you don’t like it.
>>>> WebAssembly does not need updates when C++ gains new language features,
>>>> That’s not really true!  WebAssembly has to evolve to support some new
>>>> features like threads and maybe simd.
>>>> and I think this is a strength of both WebAssembly and SPIR-V.
>>>> Both of them have been revved with new stuff in the past. Both of them
>>>> will probably be revved with new stuff in the future.
>>>> Would you be interested in a non-evolving AST-level ingestion format?
>>>> Yes, if that format is text on the wire, since that is the most
>>>> efficient and simple way to express an AST format. One of the lessons I
>>>> learned from wasm is that binary serialization of ASTs is really hard, and
>>>> considering the time it took to reach consensus on the technique wasm ended
>>>> up using, I think that it’s just simpler to use a text format.
>>>> Specifically:
>>>> - text formats basically mean using delimiters (like { and }) around
>>>> blocks of code. If you go binary you either have to invent some other
>>>> delimiter or use block headers that tell the length. From a parsing
>>>> standpoint, binary is just not any better than text.
>>>> - text formats are trivial to introspect. There is no need for a
>>>> separate text encoding used for View Source.
>>>> I think that any argument in favor of binary has to be strong enough to
>>>> counterbalance text’s benefits for view source.
>>>> Maybe we should discuss it. (Although, IMO, existing HLSL is already
>>>> too complex to use as a WASM-level AST-style format; Inventing a new format
>>>> or repurposing WASM would be painful because it gets us neither an existing
>>>> tool ecosystem nor an existing application ecosystem.)
>>>> WHLSL (i.e. WSL at the time) started out as more of the thing you want,
>>>> since it didn’t initially have all the stuff necessary to support all of
>>>> HLSL. We removed generics to make the language even simpler.
>>>> In the last call, we talked about going for full HLSL compatibility.
>>>> That’s making WHLSL less like the thing that you want. For example, WHLSL
>>>> currently avoids some complexity by having less of the lvalue magic that C
>>>> has and by having a more restrictive parser. WHLSL also uses operator
>>>> overloading to make many primitive operations (like +) exist outside the
>>>> language itself - the language just views + as a function call.
>>>> Personally, I’d be happy with a text shader format that goes for
>>>> extreme simplicity. You could imagine making some additional
>>>> simplifications, like requiring that all variables are declared at the top
>>>> of function. Maybe there is even more that can be done to reduce
>>>> complexity. My position is that these are the good things we want in a web
>>>> shader format:
>>>> 1) text
>>>> 2) security
>>>> 3) simplicity
>>>> 4) compiler target
>>>> 5) similar level of abstraction to SPIR-V
>>>> WHLSL currently satisfies 1, 2, 4, and 5 but may be diverging from 3
>>>> because of the desire for full HLSL compat.
>>>> You could even imagine this:
>>>> - WHLSL is like a kernel language (not in the sense of numerical kernel
>>>> but in the sense of just having the core functionality) and doesn’t evolve
>>>> much.
>>>> - some other HLSL flavor has All The Features.
>>>> - programmers can use WHLSL directly or they can use it as a compiler
>>>> target.
>>>> > Before SSA, folks used IRs with numbered temporaries like 3AC.
>>>> IMO, 3AC is more like SSA than like AST when it comes to most issues,
>>>> such as applying code transformations.
>>>> I agree.
>>>> Regardless, I agree that coming up with new variable names is not
>>>> particularly problematic.
>>>> On Wed, Nov 7, 2018 at 2:42 PM Filip Pizlo <fpizlo@apple.com> wrote:
>>>>> On Nov 7, 2018, at 5:15 PM, Kai Ninomiya <kainino@google.com> wrote:
>>>>> > OpLifetimeStart and OpLifetimeEnd are instructions in the SPIR-V
>>>>> language, which presumably means that lifetimes are not clearly expressed
>>>>> with those instructions. Even with the addition of those instructions, they
>>>>> can’t be trusted because they have to be validated, which means they could
>>>>> lie.
>>>>> According to your links, OpLifetimeStart/OpLifetimeEnd are only valid
>>>>> with Kernel capability (i.e. OpenCL). I would guess this is related to
>>>>> physical pointers.
>>>>> > Things like plumbing bounds around with other objects would require
>>>>> rewriting functions and variables and operations on those variables. It
>>>>> would require generating new SSA IDs or possibly regenerating / reassigning
>>>>> them
>>>>> Generating and reassigning SSA IDs is extremely simple compared with
>>>>> non-SSA IDs. This is why SSA is used in modern compilers to begin with.
>>>>> Before SSA, folks used IRs with numbered temporaries like 3AC. The
>>>>> thing that SSA brings to the table is that it makes it easy to find the
>>>>> definition of a variable given its use. That’s why compilers use it. If all
>>>>> they wanted was an easy way to generate IDs then it’s just as easy to do
>>>>> without SSA as with SSA.
>>>>> That said, I think both of you guys have a point:
>>>>> - It’s true that editing SPIR-V to insert checks will mean that you’re
>>>>> not simply passing a SPIR-V blob through. You’re going to have to decode it
>>>>> to an SSA object graph and then encode that graph back to a blob.
>>>>> - It’s true that SPIR-V’s use of 32-bit variable IDs makes generating
>>>>> new ones straightforward.
>>>>> But I should note that since WebHLSL is not a higher order language,
>>>>> generating new variable names is pretty easy. Any name not already used is
>>>>> appropriate, which isn’t significantly different from finding a spare
>>>>> 32-but variable ID.
>>>>> > The evidence from WebAssembly vs JavaScript suggests this probably
>>>>> won’t be true (if by “easier” you mean either “faster” or “simpler to code
>>>>> correctly”).
>>>>> It sounds like you are claiming that the JavaScript parser/code
>>>>> generator is not more complex than the WASM parser/code generator. Is this
>>>>> correct? Can you provide evidence for this claim?
>>>>> Depends on what you mean by complexity. And it depends on a lot of
>>>>> things that are not really inherent to the languages. And it depends on
>>>>> whether you account for the handicap in JS due to JS being a more complex
>>>>> language in ways that have nothing to do with binary versus text.
>>>>> Without a doubt, parsing JavaScript is de facto more code than parsing
>>>>> WebAssembly. This happens mostly because those parsers have been hyper
>>>>> optimized over a long time (decade or more in some cases, like the one in
>>>>> JSC). Maybe it’s also more code to parse JS even if you didn’t do those
>>>>> optimizations, but I’m not sure we have an easy way of knowing just by
>>>>> looking at an existing JS parser or wasm parser.
>>>>> What is sure is that JavaScript has better startup time than
>>>>> WebAssembly. See:
>>>>> https://pspdfkit.com/blog/2018/a-real-world-webassembly-benchmark/
>>>>> So if “complexity” is about time then I don’t think that WebAssembly
>>>>> wins.
>>>>> Looks like this varies by browser and it also looks like cases where
>>>>> one language is faster to start than the other have more to do with the
>>>>> compiler backend than parsing.
>>>>> If by complexity you mean bugs, then WebAssembly parsing has bugs as
>>>>> does JS. JS parsing has less bugs for us, but that may have to do more with
>>>>> JS being very mature. It may also be because parsing text is easier to get
>>>>> right.
>>>>> If by complexity you mean amount of code or difficulty of code after
>>>>> the parser but before the backend, then it’s unclear. WebAssembly and
>>>>> JavaScript both have some quirks that implementations have to deal with
>>>>> before emitting code to the backend. JSC does weird stuff to JS before
>>>>> emitting bytecode and it has significant complexity in how it interprets
>>>>> wasm to produce B3 IR. Also, WebAssembly opted against SSA - it’s more of
>>>>> an AST serialization disguised as a stack-based bytecode than SSA. I think
>>>>> wasm opted for that because dealing with something AST-like as an input was
>>>>> thought to be easier than dealing with SSA as an input.
>>>>> -Filip
>>>>> On Wed, Nov 7, 2018 at 2:11 PM Myles C. Maxfield <mmaxfield@apple.com>
>>>>> wrote:
>>>>>> On Nov 6, 2018, at 3:55 PM, Jeff Gilbert <jgilbert@mozilla.com>
>>>>>> wrote:
>>>>>> I don't think it's necessarily helpful to think of this discussion as
>>>>>> predominately binary vs text.
>>>>>> I think there is a lot of value in a constrained, targeted ingestion
>>>>>> format, *and separately* I think SPIR-V is a natural choice for this
>>>>>> ingestion format.
>>>>>> SPIR-V's core format is very, very easy to parse,
>>>>>> SPIR-V is a sequence of 32-bit words, so you’re right that it’s easy
>>>>>> to read a sequence of 32-bit words.
>>>>>> However, a Web browser’s job is to understand any possible sequence
>>>>>> of inputs. What should a browser do when it encounters two
>>>>>> OpEntryPoint
>>>>>> <https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpEntryPoint> instructions
>>>>>> that happen to have the same name but different execution models? What
>>>>>> happens when an ArrayStride
>>>>>> <https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#Decoration> decoration
>>>>>> is set to 17 bytes? What happens when both SpecId and BuiltIn decorations
>>>>>> are applied to the same value
>>>>>> <https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_shadervalidation_a_validation_rules_for_shader_a_href_capability_capabilities_a>?
>>>>>> SPIR-V today is clearly not a dream for ingestion. It is more difficult for
>>>>>> a browser to understand a SPIR-V program than a WHLSL program.
>>>>>> and lends itself
>>>>>> well to simple but robust parsing. Lifetimes are clearly expressed,
>>>>>> instruction invocations are very explicit, and ecosystem support is
>>>>>> already good. It's a dream format for ingestion.
>>>>>> Binning it with other (particularly older) binary formats is just
>>>>>> inaccurate. Doing the initial parse gives you the structures
>>>>>> (functions, types, bindings) you want pretty immediately. By
>>>>>> construction, most unsafe constructs are impossible or trivially
>>>>>> validatable. (SSA, instruction requirements, unsafe types, pointers)
>>>>>> For what it's worth, text formats are technically binary formats
>>>>>> with a charset. I would rather consume a constrained,
>>>>>> rigidly-structured (SSA-like? s-expressions?) text-based assembly
>>>>>> than some binary formats I've worked with. (DER, ugh!)
>>>>>> Disentangling our ingestion format from the pressures of both
>>>>>> redundancies and elisions that are desirable in directly-authored
>>>>>> languages, simplifies things and actually prevents ambiguity. It
>>>>>> immediately frees the authoring language to change and evolve at a
>>>>>> faster rate, and tolerates more experimentation.
>>>>>> I would rather solve the compilation tool distribution use-case
>>>>>> without sacrificing simplicity and robustness in ingestion. A
>>>>>> authoring-to-ingestion language compiler in a JS library would let us
>>>>>> trivially share everything above the web-IR->host-IR translation,
>>>>>> including optimization passes.
>>>>>> On Tue, Nov 6, 2018 at 3:16 PM Ken Russell <kbr@google.com> wrote:
>>>>>> Hi Myles,
>>>>>> Our viewpoint is based on the experience of using GLSL as WebGL's
>>>>>> input language, and dealing with hundreds of bugs associated with parsing,
>>>>>> validating, and passing a textual shading language through to underlying
>>>>>> drivers.
>>>>>> Kai wrote this up at the beginning of the year in this Github issue:
>>>>>> https://github.com/gpuweb/gpuweb/issues/44 , and there is a detailed
>>>>>> bug list (which is still only a sampling of the associated bugs we fixed
>>>>>> over the years) in this spreadsheet:
>>>>>> https://docs.google.com/spreadsheets/d/1bjfZJcvGPI4M6Df5HC8BPQXbl847RpfsFKw6SI6_T30/edit#gid=0
>>>>>> Unlike what I said on the call, the main issues aren't really around
>>>>>> the parsing of the input language or string handling. Both the
>>>>>> preprocessor's and compiler's parsers in ANGLE's shader translator are
>>>>>> autogenerated from grammars. Of more concern were situations where we had
>>>>>> to semi-arbitrarily restrict the source language so that we wouldn't pass
>>>>>> shaders through to the graphics driver which would crash its own shader
>>>>>> compiler. Examples included having to restrict the "complexity" or "depth"
>>>>>> of expression trees to avoid stack overflows in some drivers (this was
>>>>>> added as an implementation-specific security workaround rather than to the
>>>>>> spec), working around bugs in variable scoping and shadowing, defeating
>>>>>> incorrect compiler optimizations, and more. Please take the time to read
>>>>>> Kai's writeup and go through the spreadsheet.
>>>>>> The question will come up: would using a lower-level representation
>>>>>> like SPIR-V for WebGPU's shaders really address these problems? I think it
>>>>>> would. SPIR-V uses  SSA form and simple numbers for variables, which will
>>>>>> eliminate entire classes of bugs in mishandling of language-level
>>>>>> identifiers, variables, and scopes. SPIR-V's primitives are lower level
>>>>>> than those in a textual shader language, and if it turns out restrictions
>>>>>> on shaders are still needed in WebGPU's environment spec in order to work
>>>>>> around driver bugs, they'll be easier to define more precisely against
>>>>>> SPIR-V than source text. Using SPIR-V as WebGPU's shader ingestion format
>>>>>> would bring other advantages, including that it's based on years of
>>>>>> experience developing a portable binary shader representation, and has been
>>>>>> designed in conjunction with GPU vendors across the industry.
>>>>>> On the conference call I didn't mean to over-generalize the topic to
>>>>>> "binary formats vs. text formats in the browser", so apologies if I
>>>>>> misspoke.
>>>>>> -Ken
>>>>>> On Mon, Nov 5, 2018 at 10:58 PM Myles C. Maxfield <
>>>>>> mmaxfield@apple.com> wrote:
>>>>>> Hi!
>>>>>> When we were discussing WebGPU today, the issue of binary vs text was
>>>>>> raised. We are confused at the viewpoint that binary languages on the Web
>>>>>> are inherently safer and more portable than text ones. All of our browsers
>>>>>> accept HTML, CSS, JavaScript, binary image formats, binary font files,
>>>>>> GLSL, and WebAssembly, and so we don’t understand how our teams came to
>>>>>> opposite conclusions given similar circumstances.
>>>>>> Can you describe the reasons for this viewpoint (as specifically as
>>>>>> possible, preferably)? We’d like to better understand the reasoning.
>>>>>> Thanks,
>>>>>> Myles
Received on Tuesday, 13 November 2018 00:19:07 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:52:25 UTC