Re: Proposal Virtual Reality "View Lock" Spec

So there's a few things to consider regarding this. For one, I think your
ViewEvent structure would need to look more like this:

interface ViewEvent : UIEvent {
    readonly attribute Quaternion orientation; // Where Quaternion is 4
floats. Prevents gimble lock.
    readonly attribute float offsetX; // offset X from the calibrated
center 0 in millimeters
    readonly attribute float offsetY; // offset Y from the calibrated
center 0 in millimeters
    readonly attribute float offsetZ; // offset Z from the calibrated
center 0 in millimeters
    readonly attribute float accelerationX; // Acceleration along X axis in
m/s^2
    readonly attribute float accelerationY; // Acceleration along Y axis in
m/s^2
    readonly attribute float accelerationZ; // Acceleration along Z axis in
m/s^2
}

You have to deal with explicit units for a case like this and not
clamped/normalized values. What would a normalized offset of 1.0 mean? Am I
slightly off center? At the other end of the room? It's meaningless without
a frame of reference. Same goes for acceleration. You can argue that you
can normalize to 1.0 == 9.8 m/s^2 but the accelerometers will happily
report values outside that range, and at that point you might as well just
report in a standard unit.

As for things like eye position and such, you'd want to query that
separately (no sense in sending it with every device), along with other
information about the device capabilities (Screen resolution, FOV, Lens
distortion factors, etc, etc.) And you'll want to account for the scenario
where there are more than one device connected to the browser.

Also, if this is going to be a high quality experience you'll want to be
able to target rendering to the HMD directly and not rely on OS mirroring
to render the image. This is a can of worms in and of itself: How do you
reference the display? Can you manipulate a DOM tree on it, or is it
limited to WebGL/Canvas2D? If you can render HTML there how do the
appropriate distortions get applied, and how do things like depth get
communicated? Does this new rendering surface share the same Javascript
scope as the page that launched it? If the HMD refreshes at 90hz and your
monitor refreshes at 60hz, when does requestAnimationFrame fire? These are
not simple questions, and need to be considered carefully to make sure that
any resulting API is useful.

Finally, it's worth considering that for a VR experience to be effective it
needs to be pretty low latency. Put bluntly: Browser suck at this.
Optimizing for scrolling large pages of flat content, text, and images is
very different from optimizing for realtime, super low latency I/O. If you
were to take an Oculus Rift and plug it into one of the existing
browser/Rift demos <https://github.com/Instrument/oculus-bridge> with
Chrome, you'll probably find that in the best case the rendering lags
behind your head movement by about 4 frames. Even if your code is rendering
at a consistent 60hz that means you're seeing ~67ms of lag, which will
result in a motion-sickness-inducing "swimming" effect where the world is
constantly catching up to your head position. And that's not even taking
into account the question of how well Javascript/WebGL can keep up with
rendering two high resolution views of a moderately complex scene,
something that even modern gaming PCs can struggle with.

That's an awful lot of work for technology that, right now, does not have a
large user base and for which the standards and conventions are still being
defined. I think that you'll have a hard time drumming up support for such
an API until the technology becomes a little more widespread.

(Disclaimer: I'm very enthusiastic about current VR research. If I sound
negative it's because I'm being practical, not because I don't want to see
this happen)

--Brandon


On Wed, Mar 26, 2014 at 12:34 AM, Brandon Andrews <
warcraftthreeft@sbcglobal.net> wrote:

> I searched, but I can't find anything relevant in the archives. Since
> pointer lock is now well supported, I think it's time to begin thinking
> about virtual reality APIs. Since this is a complex topic I think any spec
> should start simple. With that I'm proposing we have a discussion on adding
> a head tracking. This should be very generic with just position and
> orientation information. So no matter if the data is coming from a webcam,
> a VR headset, or a pair of glasses with eye tracking in the future the
> interface would be the same. This event would be similar to mouse move with
> a high sample rate (which is why in the event the head tracking and eye
> tracking are in the same event representing a user's total view).
>
> interface ViewEvent : UIEvent {
>     readonly attribute float roll; // radians, positive is slanting the
> head to the right
>     readonly attribute float pitch; // radians, positive is looking up
>     readonly attribute float yaw; // radians, positive is looking to the
> right
>     readonly attribute float offsetX; // offset X from the calibrated
> center 0 in the range -1 to 1
>     readonly attribute float offsetY; // offset Y from the calibrated
> center 0 in the range -1 to 1
>     readonly attribute float offsetZ; // offset Z from the calibrated
> center 0 in the range -1 to 1, and 0 if not supported
>     readonly attribute float leftEyeX; // left eye X position in screen
> coordinates from -1 to 1 (but not clamped) where 0 is the default if not
> supported
>     readonly attribute float leftEyeY; // left eye Y position in screen
> coordinates from -1 to 1 (but not clamped) where 0 is the default if not
> supported
>     readonly attribute float rightEyeX; // right eye X position in screen
> coordinates from -1 to 1 (but not clamped) where 0 is the default if not
> supported
>     readonly attribute float rightEyeY; // right eye Y position in screen
> coordinates from -1 to 1 (but not clamped) where 0 is the default if not
> supported
> }
>
> Then like the pointer lock spec the user would be able to request view
> lock to begin sampling head tracking data from the selected source. There
> would thus be a view lock change event.
> (It's not clear how the browser would list which sources to let the user
> choose from. So if they had a webcam method that the browser offered and an
> Oculus Rift then both would show and the user would need to choose).
>
> Now for discussion. Are there any features missing from the proposed head
> tracking API or features that VR headsets offer that need to be included
> from the beginning? Also I'm not sure what it should be called. I like
> "view lock", but it was my first thought so "head tracking" or something
> else might fit the scope of the problem better.
>
> Some justifications. The offset and head orientation are self explanatory
> and calibrated by the device. The eye offsets would be more for a UI that
> selects or highlights things as the user moves their eyes around. Examples
> would be a web enabled HUD on VR glasses and a laptop with a precision
> webcam. The user calibrates with their device software which reports the
> range (-1, -1) to (1, 1) in screen space. The values are not clamped so the
> user can look beyond the calibrated ranges. Separate left and right eye
> values enable precision and versatility since most hardware supporting eye
> tracking will have raw values for each eye.
>
>
>

Received on Wednesday, 26 March 2014 18:18:43 UTC