Re: Ill-defined programs

I’d like to take a step back and make a general point that I’ve made before on the call, but I’m not sure ever got written down.

I believe there is consensus about the following points:

1. It is possible to make a WebGPU API which is 100% fully defined, interoperable, and implementable. This is trivially provable: an API with no functions in it would fit this requirement. Beyond that, you could imagine an API which has no textures, buffers may only be used as vertex attributes, (not even any index buffers), a single framebuffer, and whose shading language doesn’t have pointers and only includes functions which have well-defined inputs and outputs. This is clearly implementable on all popular 3D graphics APIs today (even including OpenGL).

2. Such an API would be too restrictive. One fundamental benefit of the new-style graphics APIs are UAVs (aka SSBOs). (For the uninitiated, this is GPU equivalent of SharedArrayBuffer). Just like how SharedArrayBuffer has its place in the Web Platform, there are significant use cases which require these objects in order for algorithms to have good performance. For example, almost all GPGPU algorithms operate exclusively on these objects.

3. Portability is still valuable, though. Maciej already made this point much better than I could have.
3.5 Performance is also still valuable, though. (Of course.)

4. Non-portable programs will sometimes run faster on some hardware than an equivalent portable program (for some definition of “equivalent”). Often, ensuring portability means adding code. Code runs slower than an absence of code.

We’ve had discussions many times about the tradeoffs between performance and portability. The above points show that both extremes are unsuitable for a WebGPU API. Therefore, our discussions are about degrees on a spectrum. We are discussing how far we are willing to go to pursue portability.

Historically, on the Web Platform, the most successful APIs have been 100% fully defined and interoperable. We are making an API with the goal of it being inducted into the Web Platform, and we’re learning from our (collectively, the browser vendors') past experiences and mistakes. Therefore, the best WebGPU API is one which goes quite far in that direction, without going “too far.” I think this discussion is about what “too far” means.

(By the way, I use the phrase “Web Platform” to mean “the set of things that sites on the Web use and any browser should implement if it aspires to be compatible with the Web.”)

The performance cost of adding early returns or clamp instructions has yet to be measured (in a way which satisfies the entire group). Until we can characterize the performance implications, we cannot correctly decide what the appropriate trade-off is for these operations.
The performance cost of adding explicit barriers and the counterpart implicit barriers in the WebGPU API has yet to be measured (in a way which satisfies the entire group). Until we can characterize the performance implications, we cannot correctly decide what the appropriate trade-off is for these operations.
The performance of scheduled CPU/GPU data transfers and the counterpart mapped CPU/GPU data transfers in the WebGPU API has yet to be measured (in a way which satisfies the entire group). Until we characterize the performance implications, we cannot correctly decide what the appropriate trade-off is for these operations.

Therefore, we cannot divorce the performance requirements from the design of this API. We need to do investigation before we make these decisions.

Thanks,
Myles

> On Nov 13, 2017, at 8:33 PM, Gregg Tavares <w3c@greggman.com> wrote:
> 
> WebGL hasn't had some of the HTML issues because the issues couldn't be worked around. For example compressed textures. There is no standard because it's hardware based. If a dev uses a desktop only compression their page does doesn't work period on mobile and browser developers will not fix this. It's up to the dev to fix.
> 
> A few things WebGL does to enforce portability off the top of my head
> 
> * WebGL enforces non-power-of-2 restrictions even though the hardware below does not
> * WebGL enforces "texture complete" restrictions, even if the driver/GPU may or may not
> * WebGL enforces a strict GLSL version including re-writing shaders so reserved keywords on newer GLSL versions on the underlying platform can be used in user programs as though they are not reserved.
> * WebGL requires extensions to be explicitly enabled. They are not enabled by default like OpenGL
> * WebGL wraps resources in objects and requires calls to `createXXX` (OpenGL uses ints and does not require calling createXXX for all resources)
> * WebGL invalidates all resource objects on context-lost (OpenGL, because it uses int ids would have very strange behavior here)
> * WebGL wrapped uniform locations so devs didn't make the mistakes of doing uniform location math or reusing uniforms locations across programs
> * WebGL required 3 combinations of framebuffer attachments (OpenGL has no requirements)
> * WebGL2 requires queries to not report results in the same JS event
> 
> It seems like it would be nice to see similar types of considerations in WebGPU. Ideally the goal is, where possible prevent bad behavior from running accidentally.
> 
> Native devs have less of these issues because they have to manually port their apps to each platform. Web devs though don't, they generally expect to write once and just have it run everywhere. They might test mobile vs desktop or CSS/UI but they shouldn't have to test across GPUs and OSes
> 
> 
> 
> 
> On Tue, Nov 14, 2017 at 12:59 PM, Corentin Wallez <cwallez@google.com <mailto:cwallez@google.com>> wrote:
> On Mon, Nov 13, 2017 at 7:05 PM, Maciej Stachowiak <mjs@apple.com <mailto:mjs@apple.com>> wrote:
> 
> 
>> On Nov 13, 2017, at 6:50 PM, Corentin Wallez <cwallez@google.com <mailto:cwallez@google.com>> wrote:
>> 
>> Hey Maciej,
>> 
>> WebGL already has parts that follow (2.5) when accesses to arbitrary offsets are possible. In WebGL 2 this is only limited to vertex buffers and arrays in uniform buffers. In WebGPU with compute functionality, accesses to arbitrary offset become more common with the "shader storage buffers" and "texture read-write" features. When possible we would like to use hardware features to secure these accesses as this will get us the most performance. This is where (2.5) comes from because different HW implement the "robust access" feature differently.
>> 
>> This interop problem hasn't been a problem because shaders that hits any of the cases of (2.5) will usually produce very incorrect and noticeable results.
> 
> Will this still be true when more kinds of shaders can hit (2.5)?
> 
> The only data I have about this is that most of the portability issues native developers are talking about are centered on the API, not so much on the shaders. So not clear.
>  
>  - Maciej
> 
> 
>> 
>> Corentin
>> 
>> On Mon, Nov 13, 2017 at 6:03 PM, Maciej Stachowiak <mjs@apple.com <mailto:mjs@apple.com>> wrote:
>> 
>> Hi Jeff,
>> 
>> I think we might be making different assumptions about the amount of undefinedness that is being proposed:
>> 
>> (1) Approximately as interoperable as WebGL.
>> 
>> (2) if a shader program reads out of bounds or violates any other correctness constraint, then all bets are off.
>> 
>> (2.5)  if a shader program reads out of bounds or violates any other correctness constraint,, then it might do any one of a fixed list of behaviors (e.g. terminate, clamp to bounds, or return constant zero).
>> 
>> 
>> I'm reluctantly ok with (1). I'd hope we could do better, but it might be that this is as good as it gets. Real-World interop for WebGL is ok, even if it's not as good as some other older web technologies.
>> 
>> My impression is that what Dzhmitry was advocating was more like (2): "We don't care about the computed results or performance of an ill-behaved application."
>> 
>> My impression is that WebGL does not follow approach (2). The behavior for ill-defined applications is specified and covered by tests, and when there are behavior differences, it's due to things like GPUs not doing floating point arithmetic consistently. It's not due to completely different behavior for blown bounds checks.
>> 
>> I don't think (2) or even (2.5) is ok. It seems likely to create a much bigger interop problem than we have with WebGL today. It's not ok to completely abandon interop for programs with a certain class of bugs. Most nontrivial programs will be buggy, and many will com to inadvertently depend on their bugs.
>> 
>> Regards,
>> Maciej
>> 
>> 
>> 
>> > On Nov 13, 2017, at 4:37 PM, Jeff Gilbert <jgilbert@mozilla.com <mailto:jgilbert@mozilla.com>> wrote:
>> >
>> > We (Mozilla at least) definitely feel that WebGL 1 and 2 are in a
>> > great place portability-wise, and this is also the feedback we get
>> > from devs and partners we've worked with. WebGL has real
>> > implementation differences, with real portability concerns, but these
>> > have not become malignant. Interop is a very real concern, but there
>> > is middle ground between wild-wild-west (which absolutely no one here
>> > is proposing) and absolute portability for all apps. We have been very
>> > successful in striking that middle ground with WebGL, and I don't see
>> > that changing with WebGPU. Centralized test suites ensure all points
>> > of portability we require, and prevent implementations from deviating
>> > from the spec, and guiding them back into compliance.
>> >
>> > People write browser-specific code all the time, both on accident and
>> > on purpose. There's no panacea for it. We should make smart choices
>> > and properly weight the costs, as well as the benefits, of various
>> > approaches. Flatly refusing to consider slight deviations is naive,
>> > and not supported by our existing experience in this area.
>> >
>> > I'm trying to avoid rehashing years of API decisions here, and largely
>> > decisions we're happy with. I understand your concerns, but they are
>> > largely (and fortunately) not reflected in our experience with WebGL,
>> > which is the closest thing to prior art that we have here.
>> >
>> > I do not think we disagree as much as you fear, but I simply cannot
>> > agree with how hard you are pushing on excruciating portability. There
>> > is a balance to strike, and we've been fairly successful at doing so.
>> >
>> > On Mon, Nov 13, 2017 at 4:02 PM, Maciej Stachowiak <mjs@apple.com <mailto:mjs@apple.com>> wrote:
>> >>
>> >>
>> >>> On Nov 13, 2017, at 3:44 PM, Jeff Gilbert <jgilbert@mozilla.com <mailto:jgilbert@mozilla.com>> wrote:
>> >>>
>> >>> First off, those using these APIs (WebGL and WebGPU) are not writing
>> >>> geocities websites. They are engineers, and we should optimize for
>> >>> these engineers, not for the layman.
>> >>
>> >> The worst interop problems come from extremely popular sites created by professional engineers, not geocities sites made by laymen.
>> >>
>> >>>
>> >>> In WebGL, we are aggressive about issuing descriptive errors and
>> >>> warnings on malformed or questionable API use. We've found this
>> >>> quickly surfaces portability issues, and makes them very quick to
>> >>> solve. I anticipate continuing this behavior during implementation of
>> >>> WebGPU. We have a ton of tools for giving feedback to devs who are
>> >>> doing the wrong thing, particularly if we have a less-perf-sensitive
>> >>> debug mode. (which can even be a JS library, not needing to be baked
>> >>> into the browser)
>> >>>
>> >>> I think good validation is important for development, but its
>> >>> performance impact can hurt outside development, once a product is
>> >>> released. In our usage of APIs inside the browsers, outside of DEBUG
>> >>> builds, we try to turn off as much validation code as we can, given
>> >>> its performance impact. If we find it useful to do this, why would web
>> >>> apps not find it useful to do similar, and be able to run with a
>> >>> minimum (but not zero!) level of validation?
>> >>
>> >> They'll definitely want to do this. It's just a bad idea to let them, because they will accidentally code sites that only work in one browser, or only on one OS, or only one CPU architecture, or only one GPU. Some of this may be inevitable but we should try our hardest to limit it.
>> >>
>> >>>
>> >>> In the six years I've been a part of WebGL, we have not seen the
>> >>> divergence in behaviors that you are afraid of, so I'm not sure why
>> >>> you think they will happen with this new API. Please appreciate our
>> >>> experience with this field, and the differences it has as compared to
>> >>> other areas.
>> >>
>> >> According to Ken, it seems like WebGL has tried a lot harder to guarantee interop than the standard suggested here.
>> >>
>> >> Even so, we have seen WebGL sites that only work in Chrome. So WebGL doesn't prove that interop isn't a concern for graphics.
>> >>
>> >>
>> >>>
>> >>> On Mon, Nov 13, 2017 at 3:29 PM, Maciej Stachowiak <mjs@apple.com <mailto:mjs@apple.com>> wrote:
>> >>>>
>> >>>>
>> >>>>> On Nov 13, 2017, at 3:21 PM, Jeff Gilbert <jgilbert@mozilla.com <mailto:jgilbert@mozilla.com>> wrote:
>> >>>>>
>> >>>>> On Mon, Nov 13, 2017 at 12:27 PM, Maciej Stachowiak <mjs@apple.com <mailto:mjs@apple.com>> wrote:
>> >>>>>> On Nov 13, 2017, at 11:44 AM, Dzmitry Malyshau <dmalyshau@mozilla.com <mailto:dmalyshau@mozilla.com>>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>> Yes. Strong objection. If behavior of any programs is not fully specified,
>> >>>>>> then web developers will start to accidentally depend on the behavior of one
>> >>>>>> browser (usually whichever is most popular), and then browsers will have to
>> >>>>>> reverse-engineer each others' behavior. This has happened so many times in
>> >>>>>> the course of web standards development that it's almost a running joke.
>> >>>>>> Every once in a while someone says "hey, let's just not define error
>> >>>>>> handling, we only need to define the behavior for valid content" it happens.
>> >>>>>> The first time was HTML, Browsers ended up reverse-engineering each other's
>> >>>>>> error handling until finally they got sick of the W3C not defining this and
>> >>>>>> formed the WHATWG to create HTML5, which fully specified parsing behavior
>> >>>>>> for all invalid documents. CSS, JavaScript and WebAssembly also have fully
>> >>>>>> interoperable behavior by spec, even in "invalid" or "error" or
>> >>>>>> "ill-defined" cases.
>> >>>>>>
>> >>>>>> Let's not make this rookie mistake. We must fully define the behavior of all
>> >>>>>> programs.
>> >>>>>
>> >>>>> We can fully define it without requiring exact behavior.
>> >>>>
>> >>>> I'm not sure what that means. Are you suggesting a menu choice of one of N behaviors? Then the most popular behavior will become a de facto standard.
>> >>>>
>> >>>>> It's not a rookie mistake to make compromises here.
>> >>>>
>> >>>> It totally is. I can't think of a time we've done this on the web and it has been ok in the long run. I gave you a very notable failure example. Can you cite any past successes for the strategy of sacrificing interoperability on the web?
>> >>>>
>> >>>>> There are a variety of ways
>> >>>>> in which our API here, like WebGL, differs greatly from the parsing
>> >>>>> deviations in early HTML. It is not white and black. Short of proved
>> >>>>> code, there will always be accidental portability problems, even if
>> >>>>> from nothing other than cargo-culting. We should make smart
>> >>>>> compromises here, using all the tools and avenues available to us,
>> >>>>> rather than have a knee-jerk requirement for maximum portability.
>> >>>>> (absolute portability is not even possible with our base graphics
>> >>>>> APIs)
>> >>>>
>> >>>> It's my impression that, historically, graphics programmers have accepted a much lower bar for portability than the standard we strive for on the web. I believe we should shoot for something much closer to the web bar for portability. Accidental portability problems from cargo-culting are exactly the kind of problems I am worried about. The more we create the opportunity for such problems, the more likely that in 5-10 years we'll all have to reverse engineer Chrome for Windows and Safari for iOS behavior for desktop and mobile respectively.
>> >>>>
>> >>>> Regards,
>> >>>> Maciej
>> >>>>
>> >>>
>> >>
>> 
>> 
>> 
> 
> 
> 

Received on Wednesday, 15 November 2017 23:04:08 UTC