Re: Proposal Virtual Reality "View Lock" Spec from Rob Manson on 2014-03-26 (public-webapps@w3.org from January to March 2014)

From: Rob Manson <robman@mob-labs.com>
Date: Thu, 27 Mar 2014 06:25:29 +1100
To: public-webapps@w3.org
Message-ID: <533329A9.9090807@mob-labs.com>
Hi,

we've already implemented an open source library that integrates WebGL, 
WebAudio, DeviceOrientation and gUM to create an easy to use VR/AR 
framework that runs on Rift, Glass, mobile, tablet, pc, etc.

Here's an overview of our API.

   https://buildar.com/awe/tutorials/intro_to_awe.js/index.html

And the project is in our github repos.

   https://github.com/buildar/awe.js

Hope that's relevant.

PS: We're also working on a Depth Stream Extension proposal to add depth 
camera support to gUM.

roBman


On 27/03/14 5:58 AM, Lars Knudsen wrote:
> I think it could make sense to put stuff like this as an extension on 
> top of WebGL and WebAudio as they are the only two current APIs close 
> enough to the bare metal/low latency/high performance to get a decent 
> experience.  Also - I seem to remember that some earlier generation VR 
> glasses solved the game support problem by providing their own GL and 
> Joystick drivers (today - probably device orientation events) so many 
> games didn't have to bother (too much) with the integration.
>
> In theory - we could:
>
>  - extend (if needed at all) WebGL to provide stereo vision
>  - hook up WebAudio as is (as it supports audio objects, Doppler 
> effect, etc. similar to OpenAL
>  - hook up DeviceOrientation/Motion in Desktop browsers if a WiiMote, 
> HMD or other is connected
>  - hook up getUserMedia as is to the potential VR camera
>
> ..and make it possible to do low latency paths/hooks between them if 
> needed.
>
> It seems that all (or at least most) of the components are already 
> present - but proper hooks need to be made for desktop browsers at 
> least (afaik .. it's been a while ;))
>
> - Lars
>
>
> On Wed, Mar 26, 2014 at 7:18 PM, Brandon Jones <bajones@google.com 
> <mailto:bajones@google.com>> wrote:
>
>     So there's a few things to consider regarding this. For one, I
>     think your ViewEvent structure would need to look more like this:
>
>     interface ViewEvent : UIEvent {
>     readonly attribute Quaternion orientation; // Where Quaternion is
>     4 floats. Prevents gimble lock.
>     readonly attribute float offsetX; // offset X from the calibrated
>     center 0 in millimeters
>     readonly attribute float offsetY; // offset Y from the calibrated
>     center 0 in millimeters
>     readonly attribute float offsetZ; // offset Z from the calibrated
>     center 0 in millimeters
>         readonly attribute float accelerationX; // Acceleration along
>     X axis in m/s^2
>         readonly attribute float accelerationY; // Acceleration along
>     Y axis in m/s^2
>         readonly attribute float accelerationZ; // Acceleration along
>     Z axis in m/s^2
>     }
>
>     You have to deal with explicit units for a case like this and not
>     clamped/normalized values. What would a normalized offset of 1.0
>     mean? Am I slightly off center? At the other end of the room? It's
>     meaningless without a frame of reference. Same goes
>     for acceleration. You can argue that you can normalize to 1.0 ==
>     9.8 m/s^2 but the accelerometers will happily report values
>     outside that range, and at that point you might as well just
>     report in a standard unit.
>
>     As for things like eye position and such, you'd want to query that
>     separately (no sense in sending it with every device), along with
>     other information about the device capabilities (Screen
>     resolution, FOV, Lens distortion factors, etc, etc.) And you'll
>     want to account for the scenario where there are more than one
>     device connected to the browser.
>
>     Also, if this is going to be a high quality experience you'll want
>     to be able to target rendering to the HMD directly and not rely on
>     OS mirroring to render the image. This is a can of worms in and of
>     itself: How do you reference the display? Can you manipulate a DOM
>     tree on it, or is it limited to WebGL/Canvas2D? If you can render
>     HTML there how do the appropriate distortions get applied, and how
>     do things like depth get communicated? Does this new rendering
>     surface share the same Javascript scope as the page that launched
>     it? If the HMD refreshes at 90hz and your monitor refreshes at
>     60hz, when does requestAnimationFrame fire? These are not simple
>     questions, and need to be considered carefully to make sure that
>     any resulting API is useful.
>
>     Finally, it's worth considering that for a VR experience to be
>     effective it needs to be pretty low latency. Put bluntly: Browser
>     suck at this. Optimizing for scrolling large pages of flat
>     content, text, and images is very different from optimizing for
>     realtime, super low latency I/O. If you were to take an Oculus
>     Rift and plug it into one of the existing browser/Rift demos
>     <https://github.com/Instrument/oculus-bridge> with Chrome, you'll
>     probably find that in the best case the rendering lags behind your
>     head movement by about 4 frames. Even if your code is rendering at
>     a consistent 60hz that means you're seeing ~67ms of lag, which
>     will result in a motion-sickness-inducing "swimming" effect where
>     the world is constantly catching up to your head position. And
>     that's not even taking into account the question of how well
>     Javascript/WebGL can keep up with rendering two high resolution
>     views of a moderately complex scene, something that even modern
>     gaming PCs can struggle with.
>
>     That's an awful lot of work for technology that, right now, does
>     not have a large user base and for which the standards and
>     conventions are still being defined. I think that you'll have a
>     hard time drumming up support for such an API until the technology
>     becomes a little more widespread.
>
>     (Disclaimer: I'm very enthusiastic about current VR research. If I
>     sound negative it's because I'm being practical, not because I
>     don't want to see this happen)
>
>     --Brandon
>
>
>     On Wed, Mar 26, 2014 at 12:34 AM, Brandon Andrews
>     <warcraftthreeft@sbcglobal.net
>     <mailto:warcraftthreeft@sbcglobal.net>> wrote:
>
>         I searched, but I can't find anything relevant in the
>         archives. Since pointer lock is now well supported, I think
>         it's time to begin thinking about virtual reality APIs. Since
>         this is a complex topic I think any spec should start simple.
>         With that I'm proposing we have a discussion on adding a head
>         tracking. This should be very generic with just position and
>         orientation information. So no matter if the data is coming
>         from a webcam, a VR headset, or a pair of glasses with eye
>         tracking in the future the interface would be the same. This
>         event would be similar to mouse move with a high sample rate
>         (which is why in the event the head tracking and eye tracking
>         are in the same event representing a user's total view).
>
>         interface ViewEvent : UIEvent {
>             readonly attribute float roll; // radians, positive is
>         slanting the head to the right
>             readonly attribute float pitch; // radians, positive is
>         looking up
>             readonly attribute float yaw; // radians, positive is
>         looking to the right
>             readonly attribute float offsetX; // offset X from the
>         calibrated center 0 in the range -1 to 1
>             readonly attribute float offsetY; // offset Y from the
>         calibrated center 0 in the range -1 to 1
>             readonly attribute float offsetZ; // offset Z from the
>         calibrated center 0 in the range -1 to 1, and 0 if not supported
>             readonly attribute float leftEyeX; // left eye X position
>         in screen coordinates from -1 to 1 (but not clamped) where 0
>         is the default if not supported
>             readonly attribute float leftEyeY; // left eye Y position
>         in screen coordinates from -1 to 1 (but not clamped) where 0
>         is the default if not supported
>             readonly attribute float rightEyeX; // right eye X
>         position in screen coordinates from -1 to 1 (but not clamped)
>         where 0 is the default if not supported
>             readonly attribute float rightEyeY; // right eye Y
>         position in screen coordinates from -1 to 1 (but not clamped)
>         where 0 is the default if not supported
>         }
>
>         Then like the pointer lock spec the user would be able to
>         request view lock to begin sampling head tracking data from
>         the selected source. There would thus be a view lock change event.
>         (It's not clear how the browser would list which sources to
>         let the user choose from. So if they had a webcam method that
>         the browser offered and an Oculus Rift then both would show
>         and the user would need to choose).
>
>         Now for discussion. Are there any features missing from the
>         proposed head tracking API or features that VR headsets offer
>         that need to be included from the beginning? Also I'm not sure
>         what it should be called. I like "view lock", but it was my
>         first thought so "head tracking" or something else might fit
>         the scope of the problem better.
>
>         Some justifications. The offset and head orientation are self
>         explanatory and calibrated by the device. The eye offsets
>         would be more for a UI that selects or highlights things as
>         the user moves their eyes around. Examples would be a web
>         enabled HUD on VR glasses and a laptop with a precision
>         webcam. The user calibrates with their device software which
>         reports the range (-1, -1) to (1, 1) in screen space. The
>         values are not clamped so the user can look beyond the
>         calibrated ranges. Separate left and right eye values enable
>         precision and versatility since most hardware supporting eye
>         tracking will have raw values for each eye.
>
>
>
>

-- 
Rob

Checkout my new book "Getting started with WebRTC" - it's a 5 star hit
on Amazon http://www.amazon.com/dp/1782166300/?tag=packtpubli-20

CEO & co-founder
http://MOB-labs.com

Chair of the W3C Augmented Web Community Group
http://www.w3.org/community/ar

Invited Expert with the ISO, Khronos Group & W3C
Received on Wednesday, 26 March 2014 19:25:56 UTC