Re: [w3ctag/design-reviews] Review OffscreenCanvas, including ImageBitmapRenderingContext (#141) from Brandon Jones on 2017-10-26 (public-webapps-github@w3.org from October 2017)

From: Brandon Jones <notifications@github.com>
Date: Wed, 25 Oct 2017 22:53:33 -0700
To: w3ctag/design-reviews <design-reviews@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <w3ctag/design-reviews/issues/141/339559167@github.com>
Apologies for the uber-post I'm dropping in here!

> **TL;DR:** I'm proposing that `requestAnimationFrame` and `cancelAnimationFrame` be abstracted into an interface which `window` implements, and which can subsequently be implemented by any other object that needs to surface a display cadence. This is to formalize how rAF-like functionality is exposed to the web and prevent multiple similar but incompatible interfaces from emerging.

After joining the TAG call on Tuesday and talking with Alex Russell separately later that day, I think that at the very least he's got a better understanding of how the WebVR community group arrived at the interface it did in [our explainer](https://github.com/w3c/webvr/blob/master/explainer.md). Key to that clarification seemed to be highlighting the fact that we're using our rAF variant to not only control timing but deliver pose data in sync with those animation frames. Additionally in WebVR's case we also intend to deliver VR controller updates in sync with those animation frames to enable smooth tracking.

Given that understanding it seems like the primary concern on Alex's part became preventing the web from growing multiple similar but incompatible rAF-like interfaces. There are still concerns around having multiple loops running at different speeds, but that seems semi-unavoidable and not as big of a concern in the long run?

So with that in mind I talked through the issue with some other colleages, we came up with an approach that could potentially pave the way for new rAF-style interfaces. I'll sketch out some rough IDL first and then go into more detail:

```webidl
// Standard Windows rAF

callback FrameRequestCallback = void (DOMHighResTimeStamp time, FrameRequestData frameData);

interface FrameRequestData {
  // Not clear what would be useful here.
}

interface AnimationFrameProvider {
  unsigned long requestAnimationFrame(FrameRequestCallback callback);
  void cancelAnimationFrame(unsigned long handle);
}

Window implements AnimationFrameProvider;

// WebVR rAF variant

VRSession implements AnimationFrameProvider;

// This would replace the current VRPresentationFrame in the WebVR Explainer
interface VRFrameRequestData : FrameRequestData {
  readonly attribute VRSession session;
  readonly attribute FrozenArray<VRView> views;

  VRDevicePose? getDevicePose(VRCoordinateSystem coordinateSystem);
}

// For Video

HTMLVideoElement implememnts AnimationFrameProvider;

interface HTMLVideoElementFrameRequestData : FrameRequestData {
  // Useful to report some playback state here? (Already on element)
  readonly attribute double currentTime;
  readonly attribute unsigned long videoWidth;
  readonly attribute unsigned long videoHeight;
}

// For rAF in Workers

partial interface Window {
  // Terrible name alert! Ideally something more palatable.
  TranferrableAnimationFrameProvider getTransferrableAnimationFrameProvider();
}

[Transferrable]
interface TranferrableAnimationFrameProvider : AnimationFrameProvider {
  // Anything else useful/necessary here?
}
```

The first thing to note is that this approach maintains compatibility with existing `window.requestAnimationFrame` semantics, extending it in a way that should be invisible to existing pages.

For an interface that wants to then expose a rAF loop that runs at a different rate than the document rAF, such as a WebVR session, it could implement the same interface but provide a custom data structure to the callback. This would enable things like WebVR's desire to expose device pose data in sync with the frame loop. Usage in WebVR would look like so:

```js
function onDrawFrame(time, vrFrameData) {
  let pose = vrFrameData.getDevicePose(vrFrameOfRef);
  gl.bindFramebuffer(vrSession.baseLayer.framebuffer);

  for (let view in vrFrameData.views) {
    let viewport = view.getViewport(vrSession.baseLayer);
    gl.viewport(viewport.x, viewport.y, viewport.width, viewport.height);
    drawScene(view, pose);
  }

  // Request the next VR callback
  vrSession.requestAnimationFrame(onDrawFrame);
}

vrSession.requestAnimationFrame(onDrawFrame);
```

This is actually almost exactly what the explainer already shows with a couple of tweaks: The rAF function name is now `requestAnimationFrame` instead of `requestFrame` as the explainer proposes and the callback now provides a timestamp along with the VR frame data.

Another potential use for this pattern that's not well served today and other teams are trying to reason around: Enabling videos to be do processing as new frames are decoded rather than simply repeatedly uploading them each rAF as, for example, most WebGL video apps do now. Pretty much any WebGL-based video playback today does something like this:

```js
function drawFrame(time) {
  window.requestAnimationFrame(drawFrame);

  // Update video texture
  gl.bindTexture(gl.TEXTURE_2D, videoTex);
  gl.texImage2D(gl.TEXTURE_2D, ..., videoElement);

  // Other GL setup ...

  // Draw video mesh
  gl.drawArrays(gl.TRIANGLES, 0, 6);
}

window.requestAnimationFrame(drawFrame);
```

Which is problematic because the video may only update at 24Hz-30Hz, which means we're wasting work asking for the texture copy each frame here. But under the above rAF proposal it could become:

```js
function drawFrame(time) {
  window.requestAnimationFrame(drawFrame);

  // Other GL setup...

  // Draw video mesh
  gl.drawArrays(gl.TRIANGLES, 0, 6);
}

window.requestAnimationFrame(drawFrame);

function copyVideoFrame(time) {
  videoElement.requestAnimationFrame(copyVideoFrame);

  // Update video texture
  gl.bindTexture(gl.TEXTURE_2D, videoTex);
  gl.texImage2D(gl.TEXTURE_2D, ..., videoElement);
}

videoElement.requestAnimationFrame(copyVideoFrame);
```

This reduces the texture copy to the actual video framerate and creates (in my opinion) a cleaner separation of concerns. Of course, video is *complicated* and so it's not 100% clear to me that we could get the latching behavior we want out of this but quick polls of coworkers make it sound feasible.

It's worth noting that there's a WebGL extension, [WEBGL_video_texture](https://www.khronos.org/registry/webgl/extensions/proposals/WEBGL_video_texture/), that's also attempting to tackle this (as well as lowering the total texture copy cost). But talking with one of our WebGL devs it sounds like this rAF proposal might actually serve that need better?

Finally, for the case of Offscreen Canavas in a worker we could create a transferrable implementation of the interface that would likely be produced by the window. This creates a clear connection between the two and communicates well exactly what the worker rAF will be aligned to. This does NOT provide the nice `while(await)` pattern that has been discussed by the Offscreen Canvas team, but it seems like it would be trivial to write a promise-emitting wrapper around the rAF callbacks if that's needed?

Anyway, I'm sure there's quirks to work out here but I wanted to get this up to start a conversation about if this moves us in a positive direction. I'll say that from the WebVR perspective I think we could easily accomodate this type of model, with the primary concern being that we don't want to get stuck in spec limbo if coming to an agreement on a change like this is going to take another 6+ months.

Thoughts?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/w3ctag/design-reviews/issues/141#issuecomment-339559167
Received on Thursday, 26 October 2017 05:53:57 UTC