Re: WebVR and DRM from Klaus Weidner on 2017-07-12 (public-webvr@w3.org from July 2017)

From: Klaus Weidner <klausw@google.com>
Date: Wed, 12 Jul 2017 14:19:47 -0700
To: Florian Bösch <pyalot@gmail.com>
Cc: Brandon Jones <bajones@google.com>, John Foliot <john.foliot@deque.com>, "Bassbouss, Louay" <louay.bassbouss@fokus.fraunhofer.de>, "public-webvr@w3.org" <public-webvr@w3.org>, "Pham, Stefan" <stefan.pham@fokus.fraunhofer.de>
Message-ID: <CAFU2V802av_st8H2JQaEq02-3qQVom8-fF9qxk9U+OeKx8r1PA@mail.gmail.com>

On Wed, Jul 12, 2017 at 1:33 PM, Florian Bösch <pyalot@gmail.com> wrote:

> On Wed, Jul 12, 2017 at 10:19 PM, Klaus Weidner <klausw@google.com> wrote:
>>
>> That would be a change to the (as yet hypothetical) protected compositing
>> context implementation in the browser.
>>
>
> Like I said. A retardation of the programmable pipeline lacking anything
> the UA doesn't put in, which will be most things. Unless you want to
> include a bycantean fixed-function API with a pleathora of options, it's in
> no an adequate replacement for a programmable pipeline. This will show in
> applications. There's very solid reasons why we're doing programmable
> pipelines now, and don't have fixed-function pipelines anymore.
>

I agree with you that this does have similarities to a fixed-function
pipeline, but can we back up a step and look at the current state and
future options?

Current WebVR lets applications set up a video media HTML element and
connect it to a texture via a canvas, see for example
https://threejs.org/examples/webvr_video.html . This works, and offers a
fully programmable pipeline - you can use this texture however you like. In
the end, the single GL context does all the content drawing and scene
composition and produces the frame's output image. In typical
implementations that's not the final image though, it gets passed on to a
headset specific distortion / reprojection step that transforms it. That
last step is currently an extremely inflexible fixed-function (or rather
single-function) pipeline.

My understanding of the layer proposal being floated here is that we'd want
to consider supporting multiple layers as input to that last
distortion/reprojection step, expanding the functionality from its current
single-function approach. For example, instead of rendering the video
element in WebVR's WebGL context via textured quad, we'd directly provide
it to the composition step. This would enable new possibilities, for
example WebVR drawing could happen at a different resolution or color depth
from video decoding (i.e. rgb565 / truecolor on Mobile, or truecolor/HDR on
desktop). Currently, mobile WebVR renders at roughly half of the the ideal
1:1 resolution and uses multisampling (MSAA) to smooth out edges, but
multisampling doesn't help at all for texture content such as video. Being
able to tune the rendering mode to the content on a per-layer basis would
be very helpful. Another example would be implementing a head-locked layer
that doesn't get reprojected, this would reduce jitter for reticles or
similar. Finally, this could potentially also be used for drawing
high-resolution readable text within a VR scene. See for example OpenVR's
HighQualityOverlay
<https://github.com/ValveSoftware/openvr/wiki/IVROverlay::SetHighQualityOverlay>
 API.

Note that multiple layers would be useful even without DRM, and in that
case there's no need for a protected context. It won't solve all use cases
of course.

I agree that it is worth thinking about making this even more flexible. For
example, in principle the WebVR application could provide a "composition
shader" that gives it more control over the final composition step on a
per-pixel basis. I'd be in favor of adding something like that, but I feel
that the intermediate step of a slightly more flexible layer would already
be a big improvement over what we have now, and that getting experience
with a less-flexible layer approach would help inform how to generalize it
going forward.

For timing/synchronization, I think it would be a reasonable expectation
that a media element's currentTime attribute should accurately represent
the frame that'll be shown alongside the current rAF callback when it's
used as a separate layer. That roughly works already with current WebVR
using the media-to-canvas-to-texture approach. I'm not sure how useful
frame-encoded metadata would be in practice, doing any type of
current-frame pixel readback within an animation loop tends to lead to
pipeline stalls and bad performance, so I think it would be preferable to
provide such a metadata stream separately in any case.

Received on Thursday, 13 July 2017 07:30:23 UTC