- From: Klaus Weidner <klausw@google.com>
- Date: Wed, 12 Jul 2017 14:19:47 -0700
- To: Florian Bösch <pyalot@gmail.com>
- Cc: Brandon Jones <bajones@google.com>, John Foliot <john.foliot@deque.com>, "Bassbouss, Louay" <louay.bassbouss@fokus.fraunhofer.de>, "public-webvr@w3.org" <public-webvr@w3.org>, "Pham, Stefan" <stefan.pham@fokus.fraunhofer.de>
- Message-ID: <CAFU2V802av_st8H2JQaEq02-3qQVom8-fF9qxk9U+OeKx8r1PA@mail.gmail.com>
On Wed, Jul 12, 2017 at 1:33 PM, Florian Bösch <pyalot@gmail.com> wrote: > On Wed, Jul 12, 2017 at 10:19 PM, Klaus Weidner <klausw@google.com> wrote: >> >> That would be a change to the (as yet hypothetical) protected compositing >> context implementation in the browser. >> > > Like I said. A retardation of the programmable pipeline lacking anything > the UA doesn't put in, which will be most things. Unless you want to > include a bycantean fixed-function API with a pleathora of options, it's in > no an adequate replacement for a programmable pipeline. This will show in > applications. There's very solid reasons why we're doing programmable > pipelines now, and don't have fixed-function pipelines anymore. > I agree with you that this does have similarities to a fixed-function pipeline, but can we back up a step and look at the current state and future options? Current WebVR lets applications set up a video media HTML element and connect it to a texture via a canvas, see for example https://threejs.org/examples/webvr_video.html . This works, and offers a fully programmable pipeline - you can use this texture however you like. In the end, the single GL context does all the content drawing and scene composition and produces the frame's output image. In typical implementations that's not the final image though, it gets passed on to a headset specific distortion / reprojection step that transforms it. That last step is currently an extremely inflexible fixed-function (or rather single-function) pipeline. My understanding of the layer proposal being floated here is that we'd want to consider supporting multiple layers as input to that last distortion/reprojection step, expanding the functionality from its current single-function approach. For example, instead of rendering the video element in WebVR's WebGL context via textured quad, we'd directly provide it to the composition step. This would enable new possibilities, for example WebVR drawing could happen at a different resolution or color depth from video decoding (i.e. rgb565 / truecolor on Mobile, or truecolor/HDR on desktop). Currently, mobile WebVR renders at roughly half of the the ideal 1:1 resolution and uses multisampling (MSAA) to smooth out edges, but multisampling doesn't help at all for texture content such as video. Being able to tune the rendering mode to the content on a per-layer basis would be very helpful. Another example would be implementing a head-locked layer that doesn't get reprojected, this would reduce jitter for reticles or similar. Finally, this could potentially also be used for drawing high-resolution readable text within a VR scene. See for example OpenVR's HighQualityOverlay <https://github.com/ValveSoftware/openvr/wiki/IVROverlay::SetHighQualityOverlay> API. Note that multiple layers would be useful even without DRM, and in that case there's no need for a protected context. It won't solve all use cases of course. I agree that it is worth thinking about making this even more flexible. For example, in principle the WebVR application could provide a "composition shader" that gives it more control over the final composition step on a per-pixel basis. I'd be in favor of adding something like that, but I feel that the intermediate step of a slightly more flexible layer would already be a big improvement over what we have now, and that getting experience with a less-flexible layer approach would help inform how to generalize it going forward. For timing/synchronization, I think it would be a reasonable expectation that a media element's currentTime attribute should accurately represent the frame that'll be shown alongside the current rAF callback when it's used as a separate layer. That roughly works already with current WebVR using the media-to-canvas-to-texture approach. I'm not sure how useful frame-encoded metadata would be in practice, doing any type of current-frame pixel readback within an animation loop tends to lead to pipeline stalls and bad performance, so I think it would be preferable to provide such a metadata stream separately in any case.
Received on Thursday, 13 July 2017 07:30:23 UTC