Re: [w3ctag/design-reviews] Review OffscreenCanvas, including ImageBitmapRenderingContext (#141) from juj on 2017-10-21 (public-webapps-github@w3.org from October 2017)

From: juj <notifications@github.com>
Date: Sat, 21 Oct 2017 15:50:33 +0000 (UTC)
To: w3ctag/design-reviews <design-reviews@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <w3ctag/design-reviews/issues/141/338411791@github.com>

Let me rehash some of the use cases/needs from compiled GL code perspective (Emscripten, WebAssembly, Unity, Unreal Engine 4, ... crowds). I believe these are very much the same needs as WebVR applications have, since the same crowds implement VR support, and both development cases seek after the highest performance in rendering.

1. Control loops:

Needing to refactor C/C++ code to run event-based rather than being able to maintain own control loops is the single biggest blocker to improving portability at the moment. The term `control loop` is preferred here instead of the expression `main loop`, since the latter occassionally creates an illusion that applications would be structured something like a "main" top level `int main() { for(;;) tick(); }` loop form. Those types of applications are trivial to asynchronify, and are not an issue.

The issue is that native codebases can have multiple different control loops, several nested control loops, or even if it's just one, it can be deeply nested in a call stack, and refactoring the whole application to run asynchronously event based is something that is often too difficult to do. Experience shows that in the cases that developers have been successful in asynchronifying the codebases, this can touch so many locations in code that upstream project no longer wants to take the modifications in, and the effort ends up becoming a bitrotting experimental proof of concept work. Off the top of my head, this has happened for example to Qt, wxWidgets, ScummVM, DosBox, Mame & Mess projects, to name a few. That is why Emscripten is looking to enable a model where one can run code in a Worker and allow code to retain their own control loops unmodified, never yielding back to the browser in that Worker. This will greatly improve the portability of how much code can be compiled to the web.

One thing this prevents is the receival of postMessage()s and other events in that Worker that is spinning its own control loops. For those scenarios, we have a SharedArrayBuffer-based event queue for each Worker, which the application can then synchronously post and receive its web events in.

It should be stressed here is that the intent is not to fix all Emscripten WebAssembly-compiled applications to always run in such a model, but Emscripten enables both types of computation models (async event-based in main thread or Worker, and sync control loops in Worker), so that code can use whichever they see more suited for the codebase in question.

On the surface, it might seem that the async-await keywords would enable one to run synchronous control loops, if there was a Promise variant of rAF, but that method does not quite work, the computation model that async-await delivers is subtly different than is needed here. This has been discussed in https://github.com/junov/OffscreenCanvasAnimation/issues/1.

2. yield = swap:

The "yielding back from event handler is an implicit WebGL swap" model is not suitable for applications that do their own control loops in a Web Worker. That is why the explicit .commit() call would be useful for Workers that utilize OffscreenCanvas; that would enable those applications to present a frame from the Worker using a mechanism that does not require them to yield. Other applications use rendering models that are not based on interactive animations, and they might not be rendering as a response to an external event, but they might be doing some computation, after which they'll present the produced results, then they'll compute some more, and then swap again to present. Scientific applications and loading screens can be like that - they don't have a 1:1 correspondence of an 1 event=1 swap, or 1 turn of a control loop=1 swap, but they are structured to present after some piece of computation (that is run sequentially/imperatively) finishes.

Currently in Emscripten we do all rendering to a separate offscreen FBO for the above types of applications, and then offer an explicit swap function for these apps, which blits the offscreen FBO to the screen to be visible. This is inefficient, but works, with the caveat that presentation is still limited to whatever the composition interval of the browser is, e.g. swapping more often than what the rAF() composition rate is will lead to discarded frames that are never shown to the user, which is not ideal.

Having an imperative swap function would also be useful for portability, because that is exactly the model that most other platforms have - there are WGL_swap_buffers, GLX_swap_buffers, EGL_swap_buffers, D3D present etc. functionality that allows one to explicitly say when to .commit(). Being able to provide the same functionality is great for retaining an unified codebase structure across platforms. Otherwise applications will need to start auditing their GL rendering patterns, and identify how draw calls relate to swapping, and make sure to refactor so that they are able to render everything in exactly one web event callback (or use the FBO fallback, impacting performance). This might not sound too hard if you are the first party developer of the codebase in question, but often it happens that the developers retargeting projects to the web are different from the people who originally wrote the software, which means that developers can be working on porting codebases they know relatively little about. This fact is often underappreciated, and developers working in such a situation may get labeled as amateur, since the "perfect knowledge and control" of code is regarded a hallmark of an expert developer. Decoupling control flow from the decision of when to present would bring flexibility via orthogonality, as these two things are fundamentally two unrelated programming concepts. As result, developers would not need to pay attention to technicalities that implicit swap behavior has, and more code would be possible to support out of the box without consuming productivity cycles on.

3. vsync rate:

There is a combination from a number of items in play:

a) in some browsers, rAF() is hardcoded to run 1:1 with display's vsync rate,
b) in other browsers, rAF() is hardcoded to run at 60Hz,
c) the rAF() rate may not be a constant with respect to for example page lifetime, but can vary at runtime, e.g. in multimonitor setup when one moves the browser window over to another display that has a different hz rate,
d) there is no API to ask what the current rAF() presentation rate is.

In order to reduce rendering microstutter, a behavior that is hated with passion and sometimes creates strong ill emotions in gaming audiences, applications commonly to lock their animation update timings to vsync. That is, instead of updating animation via variable timesteps measured via timed performance.now() deltas, which returns jittering measures, applications take performance.now() measures and fix to round them to nearest elapsed multiple of 1000/refresh_rate_Hz msecs, when they know that the frames will be presented with such quantas when presentation is locked to vsync.

For example, if an application knows that its presentation is locked against a 60Hz rate, then it generally desires to do fixed 16.667ms slices of animation updates, rather than applying dynamic length steps that are computed from elapsed times via performance.now() since last update. In this model, one generally uses performance.now() to estimate when full vsync intervals have been missed, and e.g. a performance.now() delta of, say, 28ms since previous update, would mean that the app will want to take 2x 16.6667ms update slices to align to the arrival time of the next vsync window. However this kind of computation requires knowing what the exact rAF() vsync rate of the current display is.

Other times, applications may want to update at a lower, or at a specific controlled refresh rate. For example, a heavy game application (or if low battery is detected) might want to cap rendering to 30Hz, independent of if running on a 60Hz or a 120Hz display. Or a video application may want to update at a rate that is closest to 24Hz, by detecting what the closest such possible presentation rate might be, and then computing what the needed pulldown/pullup algorithm will be, e.g. to align a source 24Hz video stream to the actual presentation rate.

Since there is no API to query what the rAF()/vsync rate is, one will currently need to benchmark it. But in order to benchmark it, one cannot do much rendering during benchmarking, because too heavy rendering would cause one to miss vsync intervals, resulting in noisy/incorrect benchmark estimates. Because of c) above (an effect that is definitely desired), one cannot just measure the rAF() rate once at page load time, but will need to occassionally keep remeasuring in case the rAF() rate might have changed.

So because rAF() rate can change, and measuring rAF() rate is an activity that prevents actual rendering, this activity becomes a type of exploration-vs-exploitation problem. One will need to explore what the rAF rate is at suitable times, but at the same time, one wants to maximize the time to actually present at that rate, leading to a heuristic juggling of when to re-benchmark the rAF rate while pausing rendering.

To get rid of all of the above, it would be great to have an explicit API to ask what the current refresh rate is of the rAF()/other presentation mechanism is, and have that be an API that one can keep referring to, to be on the look out for if/when it changes. something like canvas.verticalSyncRate property (get the vsync rate of the display that the current canvas is on), or something effectively similar that could be multimonitor aware.

4. Rendering decoupled from vsync:

Sometimes to minimize latency, one wants to disable vsync, and push frames as fast as possible. Other times, one would like to utilize adaptive sync, FreeSync or GSync, which offer more advanced vsync control. For these cases the explicit .commit() function would fit well, because it would naturally scale to pushing frames as fast as possible, and with minimal latency.

Rendering without vsync enabled is desirable mostly in fullscreen presentation modes. In windowed mode, browser has to be aware to composit with other page contents, where I understand it might not be possible to composit other page content with vsync, but just present the canvas window without waiting for vsync. Nevertheless, it would be good if the API for presenting without vsync was decoupled from fullscreen presentation, since perhaps some platforms might be able to do that, and having to exit fullscreen and re-enter if one wanted to change vsync on/off would be poor UX. In native world, the vsync synchronization choice comes as something that can be done for every single present separately - there are no inherent "mode changes" for the display or GPU involved or otherwise, so preserving something similar would be nice.

5. Rendering by setting a custom vsync rate:

Expanding on what was touched on in above, applications commonly would like to specify what the used vsync presentation interval is. This allows application to scale resources appropriately, and avoid rendering too often, or opt in to more frequent rendering. There are two ways that applications want to control vsync: I) by setting the vsync rate from the list of supported rates by the display, and/or II) by applying a decimation factor (1/2, 1/3, 1/4, ...) to a specified vsync rate.

Some of the example cases for these needs were referred to above: a source animation (video) that was authored at 24Hz might want to configure the vsync rate to be 120Hz, with a decimation factor of 1/5, if the display supported 120Hz, and if not, then set to 60Hz and do 3:2 pulldown.

The method I) can be fundamentally incompatible with other web page compositing, so I) would be best restricted to Fullscreen API, and VR display presentation API, where a given canvas is the only fullscreen element on a particular display device. Method II) can be implemented in native applications by specifying a swap interval to native swap/present calls, and a similar item could be imaginable to exist in .commit({swapInterval: 4}) or rAF({swapInterval: 4}) calls. To parallel the native world, perhaps a call .commit({swapInterval: 0}) might present without waiting for vsync (no sleeping), and .commit({swapInterval: 1}) could present with 1:1 vsync (sleep/block until new buffer is free), and .commit({swapInterval: 2}) could present with 1:2 vsync (decimated in half).

My understanding is that needing to operate on a custom presentation rate is what led to WebVR API proposing their own rAF()-like machinery. It would indeed be great to have a symmetric API for all of this, e.g. by allowing requestFullscreen() to customize which display to take the target element fullscreen on (current browser display vs VR display), while setting the vsync rate when performing the fullscreen request. The vsync rate decimation could then be paired with a .commit({swapInterval: 4}) or rAF({swapInterval: 4}) API. One API trouble there is that currently requestFullscreen API is hardcoded to allow only exactly one fullscreen activity at a time, whereas with multiple displays and VR displays, one might want to go fullscreen on two displays simultaneously (different canvas on each).

The above aspect is important especially for VR, since desktop VR applications have been going to the direction that what the headset renders is not a mirrored copy of what the desktop display shows, but one might desire to render a non-ocular-warped regular 3D view of the scene for other observers to enjoy, and some 2D control UI that is not visible in the headset display itself.

In summary, there are a few different scenarios, and some of above don't specifically relate to .commit(), and some definitely won't get resolved in the scope of OffscreenCanvas, but just thought to do a bit more thorough writeup to illustrate the types of hurdles that people targeting the web from native codebases currently have around this topic. We can (and do) emulate and work around a lot, but that has various drawbacks in performance and corner cases. The offscreen FBO hack can be used even if OffscreenCanvas did not have a .commit(), though by taking a fillrate hit.

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/w3ctag/design-reviews/issues/141#issuecomment-338411791

Received on Saturday, 21 October 2017 15:51:01 UTC