- From: Brandon Andrews <warcraftthreeft@sbcglobal.net>
- Date: Wed, 26 Mar 2014 23:50:39 -0700 (PDT)
- To: Lars Knudsen <larsgk@gmail.com>, Brandon Jones <bajones@google.com>
- Cc: "public-webapps@w3.org" <public-webapps@w3.org>
>Brandon Jones: >So there's a few things to consider regarding this. For one, I think your ViewEvent structure would need to look more like this: > >interface ViewEvent : UIEvent { > readonly attribute Quaternion orientation; // Where Quaternion is 4 floats. Prevents gimble lock. > readonly attribute float offsetX; // offset X from the calibrated center 0 in millimeters > readonly attribute float offsetY; // offset Y from the calibrated center 0 in millimeters > readonly attribute float offsetZ; // offset Z from the calibrated center 0 in millimeters > readonly attribute float accelerationX; // Acceleration along X axis in m/s^2 > readonly attribute float accelerationY; // Acceleration along Y axis in m/s^2 > readonly attribute float accelerationZ; // Acceleration along Z axis in m/s^2 >} > > >You have to deal with explicit units for a case like this and not clamped/normalized values. What would a normalized offset of 1.0 mean? Am I slightly off center? At the other end of the room? It's meaningless without a frame of reference. Same goes for acceleration. You can argue that you can normalize to 1.0 == 9.8 m/s^2 but the accelerometers will happily report values outside that range, and at that point you might as well just report in a standard unit. I could see having explicit units for translation if the device could output them. The idea of normalized values (not talking about clamping) is to let the user set what they feel is the maximum movement they want the device to detect separate from the specific application. So for moving left and right you might calibrate the device such that -1 and 1 is you leaning either way 15 cm from the center. Any program then that you load from the web would interpret those ranges the same. As for quats for orientation there's no advantage over that than euler angles. You can use the euler angles to build a quaternion from which to use.I do have one question though. Would the ViewEvent need orientation, angular velocity and angular acceleration? What about a translation velocity? > As for things like eye position and such, you'd want to query that separately (no sense in sending it with every device), along with other information about the device capabilities (Screen resolution, FOV, Lens distortion factors, etc, etc.) And you'll want to account for the scenario where there are more than one device connected to the browser. Seems sensible if there's a lot of data or different update frequencies. So like an eye event. Could use pixels rather than a normalization there and let the pixel value go outside of the screen. I don't know why I used normalized coordinates there since screen resolution is known. You mention lens distortion factors. Are there well known variables for defining the lenses that a VR could provide the user or to the browser to automate the distortions without having custom shader logic provided by a driver? interface EyeEvent : UIEvent { long leftX; // pixels but not clamped to the screen resolution long leftY; long rightX; long rightY; } >Also, if this is going to be a high quality experience you'll want to be able to target rendering to the HMD directly and not rely on OS mirroring to render the image. This is a can of worms in and of itself: How do you reference the display? Can you manipulate a DOM tree on it, or is it limited to WebGL/Canvas2D? If you can render HTML there how do the appropriate distortions get applied, and how do things like depth get communicated? Does this new rendering surface share the same Javascript scope as the page that launched it? If the HMD refreshes at 90hz and your monitor refreshes at 60hz, when does requestAnimationFrame fire? These are not simple questions, and need to be considered carefully to make sure that any resulting API is useful. You hit on why this is a view lock. requestAnimationFrame would fire at the rate of the device. Regarding the distortion for an HTML page that would require some special considerations. I mean if you allow one to lock onto any dom element (like the fullscreen spec) to send to the HMD you then have to define the distortion and transformation to fit it to the device. I think a good DOM element case to focus on would be someone making a rotating CSS3 cube using the 3d transformations. Say the user has an event to request device info that returns the screen resolutions and offset information. The user could use this to makes the dom element the size of the view plane if they know the FoV. They'd need to feed in this FoV, distance to the DOM element (in pixels?). These variables could be passed in when a lock is requested. This kind of overlaps into what Lars was talking about with making this an extension of another API. I think that API might be the fullscreen API. The behavior might need to differ for canvas elements though which would handle their own distortions? >Finally, it's worth considering that for a VR experience to be effective it needs to be pretty low latency. Put bluntly: Browser suck at this. Optimizing for scrolling large pages of flat content, text, and images is very different from optimizing for realtime, super low latency I/O. If you were to take an Oculus Rift and plug it into one of the existing browser/Rift demos with Chrome, you'll probably find that in the best case the rendering lags behind your head movement by about 4 frames. Even if your code is rendering at a consistent 60hz that means you're seeing ~67ms of lag, which will result in a motion-sickness-inducing "swimming" effect where the world is constantly catching up to your head position. And that's not even taking into account the question of how well Javascript/WebGL can keep up with rendering two high resolution views of a moderately complex scene, something that even modern gaming PCs can struggle with. Basically setting the groundwork to start prototyping and then finding these issues before an implementation is created. Also I think it's safe to assume that most browsers are becoming more and more GPU accelerated. VR is for the future, so assuming future hardware seems sensible to keep in mind. (Remember sites of the future will look very similar to https://www.youtube.com/watch?v=8wXBe2jTdx4 ). > That's an awful lot of work for technology that, right now, does not have a large user base and for which the standards and conventions are still being defined. I think that you'll have a hard time drumming up support for such an API until the technology becomes a little more widespread. Yeah, I'm assumed it'll take around a year of discussion to get things moving or implementers that aren't busy. The idea though is when support is here the discussions will have been mostly done and a rough draft spec will be waiting for implementers to move to an experimental stage. >Lars: >I think it could make sense to put stuff like this as an extension on top of WebGL and WebAudio as they are the only two current APIs close enough to the bare metal/low latency/high performance to get a decent experience. Also - I seem to remember that some earlier generation VR glasses solved the game support problem by providing their own GL and Joystick drivers (today - probably device orientation events) so many games didn't have to bother (too much) with the integration. Associating it with WebAudio doesn't make sense. You'd just be using the orientation information to change a few variables to make the position audio examples work. For WebGL it's just a distortion shader give or take that uses the orientation as inputs to a view matrix uniform. I think the head tracking is generic enough not to be part of either spec. I mean in the future browsers could be fully GPU accelerated so making a separate spec is ideal. The closest spec I think that this would be an extension for is the fullscreen spec. That is rendering a DOM element to a HMD at a separate rate than the normal page and with a FOV, distance from the page, eye offsets, and distortion. If anyone knows all the variables required that would be useful or the ideal method. I think all head mounted displays use a parallel frustum method with perspective matrices which might simplify the inputs. So what information does the user need to be able to request from any HMD? The size of each screen in pixels (where 0x0 would mean no screen) for the left and right eye. The offset from the center for each eye? Then like a preferred field of view? { leftWidth; // pixels leftHeight; // pixels rightWidth; // pixels rightHeight; // pixels leftOffset; // mm rightOffset // mm preferredFieldOfView; // vertical FOV in radians? } Sorry if these seems like a lot of questions. I promise to go through and collect all the useful pieces into a summary post once they're answered.
Received on Thursday, 27 March 2014 06:51:08 UTC