Re: Proposal Virtual Reality "View Lock" Spec from Brandon Andrews on 2014-03-27 (public-webapps@w3.org from January to March 2014)

From: Brandon Andrews <warcraftthreeft@sbcglobal.net>
Date: Wed, 26 Mar 2014 23:50:39 -0700 (PDT)
To: Lars Knudsen <larsgk@gmail.com>, Brandon Jones <bajones@google.com>
Cc: "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <1395903039.80846.YahooMailNeo@web181702.mail.ne1.yahoo.com>
>Brandon Jones:

>So there's a few things to consider regarding this. For one, I think your ViewEvent 
structure would need to look more like this:

>

>interface ViewEvent : UIEvent {
>    readonly attribute Quaternion orientation; // Where Quaternion is 4 floats. Prevents gimble lock.
>    readonly attribute float offsetX; // offset X from the calibrated center 0 in millimeters
>    readonly attribute float offsetY; // offset Y from the calibrated center 0 in millimeters
>    readonly attribute float offsetZ; // offset Z from the calibrated center 0 in millimeters
>    readonly attribute float accelerationX; // Acceleration along X axis in m/s^2
>    readonly attribute float accelerationY; // Acceleration along Y axis in m/s^2
>    readonly attribute float accelerationZ; // Acceleration along Z axis in m/s^2
>}
>
>
>You
 have to deal with explicit units for a case like this and not 
clamped/normalized values. What would a normalized offset of 1.0 mean? 
Am I slightly off center? At the other end of the room? It's meaningless without a frame of reference. Same goes 
for acceleration. You can argue that you can normalize to 1.0 == 9.8 
m/s^2 but the accelerometers will happily report values outside that 
range, and at that point you might as well just report in a standard
 unit.


I could see having explicit units for translation if the device could 
output them. The idea of normalized values (not talking about clamping) 
is to let the user set what they feel is the maximum movement they want 
the device to detect separate from the specific application. So for 
moving left and right you might calibrate the device such that -1 and 1 
is you leaning either way 15 cm from the center. Any program then that 
you load from the web would interpret those ranges the same.

As for quats for orientation there's no advantage over that than euler angles. You can use the euler angles to build a quaternion from which to use.I do have one question though. Would the ViewEvent need orientation, angular velocity and angular acceleration? What about a translation velocity?


> As for things like eye position and such, you'd want to query that 
separately (no sense in sending it with every device), along with other 
information about the device capabilities (Screen resolution, FOV, Lens 
distortion factors, etc, etc.) And you'll want to account for the 
scenario where there are more than one device connected to the browser.


Seems sensible if there's a lot of data or different update frequencies. So like an eye event. Could use pixels rather than a normalization there and let the pixel value go outside of the screen. I don't know why I used normalized coordinates there since screen resolution is known. You mention lens distortion factors. Are there well known variables for defining the lenses that a VR could provide the user or to the browser to automate the distortions without having custom shader logic provided by a driver?


interface EyeEvent : UIEvent {
    long leftX; // pixels but not clamped to the screen resolution

    long leftY;
    long rightX;
    long rightY;

}


>Also,
 if this is going to be a high quality experience you'll want to be able
 to target rendering to the HMD directly and not rely on OS mirroring to
 render the image. This is a can of worms in and of itself: How do you 
reference the display? Can you manipulate a DOM tree on it, or is it 
limited to WebGL/Canvas2D? If you can render HTML there how do the 
appropriate distortions get applied, and how do things like depth get 
communicated? Does this new rendering surface share the same Javascript 
scope as the page that launched it? If the HMD refreshes at 90hz and 
your monitor refreshes at 60hz, when does requestAnimationFrame 
fire? These are not simple questions, and need to be considered 
carefully to make sure that any resulting API is useful. 


You hit on why this is a view lock. requestAnimationFrame would fire at the rate of the device. Regarding the distortion for an HTML page that 
would require some special considerations. I mean if you allow one to 
lock onto any dom element (like the fullscreen spec) to send to the HMD you then have to define the distortion and transformation to fit it to the device.


I think a good DOM element case to focus on would be someone making a rotating 
CSS3 cube using the 3d transformations. Say the user has an event to 
request device info that returns the screen resolutions and offset information. The user could use this to makes the dom 
element the size of the view plane if they know the FoV. They'd need to feed in this FoV, distance to the DOM element (in pixels?). These variables could be passed in when a lock is requested. This kind of overlaps into what Lars was talking about with making this an extension of another API. I think that API might be the fullscreen API. The behavior might need to differ for canvas elements though which would handle their own distortions?


>Finally,
 it's worth considering that for a VR experience to be effective it 
needs to be pretty low latency. Put bluntly: Browser suck at this. 
Optimizing for scrolling large pages of flat content, text, and images 
is very different from optimizing for realtime, super low latency I/O. 
If you were to take an Oculus Rift and plug it into one of the existing 
browser/Rift demos with Chrome, you'll probably find that in the best 
case the rendering lags behind your head movement by about 4 frames. 
Even if your code is rendering at a consistent 60hz that means you're 
seeing ~67ms of lag, which will result in a motion-sickness-inducing 
"swimming" effect where the world is constantly catching up to your head
 position. And that's not even taking into account the question of how 
well Javascript/WebGL can keep up with rendering two high resolution 
views of a moderately complex scene, something that even modern gaming 
PCs can struggle with.

Basically setting the groundwork to start prototyping and then finding these issues before an implementation is created. Also I think it's safe to assume that most browsers are becoming more and more GPU accelerated. VR is for the future, so assuming future hardware seems sensible to keep in mind. (Remember sites of the future will look very similar to https://www.youtube.com/watch?v=8wXBe2jTdx4 ).

>
 That's an awful lot of work for technology that, right now, does not 
have a large user base and for which the standards and conventions are 
still being defined. I think that you'll have a hard time drumming up 
support for such an API until the technology becomes a little more 
widespread.

Yeah, I'm assumed it'll take around a year of discussion to get things moving or implementers that aren't busy. The idea though is when support is here the discussions will have been mostly done and a rough draft spec will be waiting for implementers to move to an experimental stage.



>Lars:
>I think it could make sense to put stuff like this as an extension 
on top of WebGL and WebAudio as they are the only two current APIs close
 enough to the bare metal/low latency/high performance to get a decent 
experience.  Also - I seem to remember that some earlier generation VR 
glasses solved the game support problem by providing their own GL and 
Joystick drivers (today - probably device orientation events) so many 
games didn't have to bother (too much) with the integration.


Associating it with WebAudio doesn't make sense. You'd just be using the 
orientation information to change a few variables to make the position audio examples work. For WebGL it's just a distortion shader give or take that uses the orientation as inputs to a view matrix uniform. I think the head 
tracking is generic enough not to be part of either spec. I mean in the 
future browsers could be fully GPU accelerated so making a separate spec is ideal.

The closest spec I think that this would be an extension for is the fullscreen spec. That is rendering a DOM element to a HMD at a separate rate than the normal page and with a FOV, distance from the page, eye offsets, and distortion. If anyone knows all the variables required that would be useful or the ideal method. I think all head mounted displays use a parallel frustum method with perspective matrices which might simplify the inputs.


So what information does the user need to be able to request from any HMD? The size of each screen in pixels (where 0x0 would mean no screen) for the left and right eye. The offset from the center for each eye? Then like a preferred field of view?

{
    leftWidth; // pixels

    leftHeight; // pixels

    rightWidth; // pixels

    rightHeight; // pixels

    leftOffset; // mm

    rightOffset // mm
    preferredFieldOfView; // vertical FOV in radians?

}

Sorry if these seems like a lot of questions. I promise to go through and collect all the useful pieces into a summary post once they're answered.
Received on Thursday, 27 March 2014 06:51:08 UTC