- From: Brandon Andrews <warcraftthreeft@sbcglobal.net>
- Date: Wed, 26 Mar 2014 23:50:39 -0700 (PDT)
- To: Lars Knudsen <larsgk@gmail.com>, Brandon Jones <bajones@google.com>
- Cc: "public-webapps@w3.org" <public-webapps@w3.org>
>Brandon Jones:
>So there's a few things to consider regarding this. For one, I think your ViewEvent
structure would need to look more like this:
>
>interface ViewEvent : UIEvent {
> readonly attribute Quaternion orientation; // Where Quaternion is 4 floats. Prevents gimble lock.
> readonly attribute float offsetX; // offset X from the calibrated center 0 in millimeters
> readonly attribute float offsetY; // offset Y from the calibrated center 0 in millimeters
> readonly attribute float offsetZ; // offset Z from the calibrated center 0 in millimeters
> readonly attribute float accelerationX; // Acceleration along X axis in m/s^2
> readonly attribute float accelerationY; // Acceleration along Y axis in m/s^2
> readonly attribute float accelerationZ; // Acceleration along Z axis in m/s^2
>}
>
>
>You
have to deal with explicit units for a case like this and not
clamped/normalized values. What would a normalized offset of 1.0 mean?
Am I slightly off center? At the other end of the room? It's meaningless without a frame of reference. Same goes
for acceleration. You can argue that you can normalize to 1.0 == 9.8
m/s^2 but the accelerometers will happily report values outside that
range, and at that point you might as well just report in a standard
unit.
I could see having explicit units for translation if the device could
output them. The idea of normalized values (not talking about clamping)
is to let the user set what they feel is the maximum movement they want
the device to detect separate from the specific application. So for
moving left and right you might calibrate the device such that -1 and 1
is you leaning either way 15 cm from the center. Any program then that
you load from the web would interpret those ranges the same.
As for quats for orientation there's no advantage over that than euler angles. You can use the euler angles to build a quaternion from which to use.I do have one question though. Would the ViewEvent need orientation, angular velocity and angular acceleration? What about a translation velocity?
> As for things like eye position and such, you'd want to query that
separately (no sense in sending it with every device), along with other
information about the device capabilities (Screen resolution, FOV, Lens
distortion factors, etc, etc.) And you'll want to account for the
scenario where there are more than one device connected to the browser.
Seems sensible if there's a lot of data or different update frequencies. So like an eye event. Could use pixels rather than a normalization there and let the pixel value go outside of the screen. I don't know why I used normalized coordinates there since screen resolution is known. You mention lens distortion factors. Are there well known variables for defining the lenses that a VR could provide the user or to the browser to automate the distortions without having custom shader logic provided by a driver?
interface EyeEvent : UIEvent {
long leftX; // pixels but not clamped to the screen resolution
long leftY;
long rightX;
long rightY;
}
>Also,
if this is going to be a high quality experience you'll want to be able
to target rendering to the HMD directly and not rely on OS mirroring to
render the image. This is a can of worms in and of itself: How do you
reference the display? Can you manipulate a DOM tree on it, or is it
limited to WebGL/Canvas2D? If you can render HTML there how do the
appropriate distortions get applied, and how do things like depth get
communicated? Does this new rendering surface share the same Javascript
scope as the page that launched it? If the HMD refreshes at 90hz and
your monitor refreshes at 60hz, when does requestAnimationFrame
fire? These are not simple questions, and need to be considered
carefully to make sure that any resulting API is useful.
You hit on why this is a view lock. requestAnimationFrame would fire at the rate of the device. Regarding the distortion for an HTML page that
would require some special considerations. I mean if you allow one to
lock onto any dom element (like the fullscreen spec) to send to the HMD you then have to define the distortion and transformation to fit it to the device.
I think a good DOM element case to focus on would be someone making a rotating
CSS3 cube using the 3d transformations. Say the user has an event to
request device info that returns the screen resolutions and offset information. The user could use this to makes the dom
element the size of the view plane if they know the FoV. They'd need to feed in this FoV, distance to the DOM element (in pixels?). These variables could be passed in when a lock is requested. This kind of overlaps into what Lars was talking about with making this an extension of another API. I think that API might be the fullscreen API. The behavior might need to differ for canvas elements though which would handle their own distortions?
>Finally,
it's worth considering that for a VR experience to be effective it
needs to be pretty low latency. Put bluntly: Browser suck at this.
Optimizing for scrolling large pages of flat content, text, and images
is very different from optimizing for realtime, super low latency I/O.
If you were to take an Oculus Rift and plug it into one of the existing
browser/Rift demos with Chrome, you'll probably find that in the best
case the rendering lags behind your head movement by about 4 frames.
Even if your code is rendering at a consistent 60hz that means you're
seeing ~67ms of lag, which will result in a motion-sickness-inducing
"swimming" effect where the world is constantly catching up to your head
position. And that's not even taking into account the question of how
well Javascript/WebGL can keep up with rendering two high resolution
views of a moderately complex scene, something that even modern gaming
PCs can struggle with.
Basically setting the groundwork to start prototyping and then finding these issues before an implementation is created. Also I think it's safe to assume that most browsers are becoming more and more GPU accelerated. VR is for the future, so assuming future hardware seems sensible to keep in mind. (Remember sites of the future will look very similar to https://www.youtube.com/watch?v=8wXBe2jTdx4 ).
>
That's an awful lot of work for technology that, right now, does not
have a large user base and for which the standards and conventions are
still being defined. I think that you'll have a hard time drumming up
support for such an API until the technology becomes a little more
widespread.
Yeah, I'm assumed it'll take around a year of discussion to get things moving or implementers that aren't busy. The idea though is when support is here the discussions will have been mostly done and a rough draft spec will be waiting for implementers to move to an experimental stage.
>Lars:
>I think it could make sense to put stuff like this as an extension
on top of WebGL and WebAudio as they are the only two current APIs close
enough to the bare metal/low latency/high performance to get a decent
experience. Also - I seem to remember that some earlier generation VR
glasses solved the game support problem by providing their own GL and
Joystick drivers (today - probably device orientation events) so many
games didn't have to bother (too much) with the integration.
Associating it with WebAudio doesn't make sense. You'd just be using the
orientation information to change a few variables to make the position audio examples work. For WebGL it's just a distortion shader give or take that uses the orientation as inputs to a view matrix uniform. I think the head
tracking is generic enough not to be part of either spec. I mean in the
future browsers could be fully GPU accelerated so making a separate spec is ideal.
The closest spec I think that this would be an extension for is the fullscreen spec. That is rendering a DOM element to a HMD at a separate rate than the normal page and with a FOV, distance from the page, eye offsets, and distortion. If anyone knows all the variables required that would be useful or the ideal method. I think all head mounted displays use a parallel frustum method with perspective matrices which might simplify the inputs.
So what information does the user need to be able to request from any HMD? The size of each screen in pixels (where 0x0 would mean no screen) for the left and right eye. The offset from the center for each eye? Then like a preferred field of view?
{
leftWidth; // pixels
leftHeight; // pixels
rightWidth; // pixels
rightHeight; // pixels
leftOffset; // mm
rightOffset // mm
preferredFieldOfView; // vertical FOV in radians?
}
Sorry if these seems like a lot of questions. I promise to go through and collect all the useful pieces into a summary post once they're answered.
Received on Thursday, 27 March 2014 06:51:08 UTC