Re: Proposal Virtual Reality "View Lock" Spec from Florian Bösch on 2014-03-27 (public-webapps@w3.org from January to March 2014)

From: Florian Bösch <pyalot@gmail.com>
Date: Thu, 27 Mar 2014 08:42:34 +0100
To: Brandon Jones <bajones@google.com>
Cc: Brandon Andrews <warcraftthreeft@sbcglobal.net>, "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <CAOK8ODgXBcttd11bNJPJbaafoVmgq=7eW1dw4kFe0t6AFg799w@mail.gmail.com>
Replied to Brandon Jones but not public, reposted below.

On Wed, Mar 26, 2014 at 7:18 PM, Brandon Jones <bajones@google.com> wrote:
>
> As for things like eye position and such, you'd want to query that
> separately (no sense in sending it with every device), along with other
> information about the device capabilities (Screen resolution, FOV, Lens
> distortion factors, etc, etc.) And you'll want to account for the scenario
> where there are more than one device connected to the browser.
>

There isn't an easy way to describe the FOV and lens distortion in a few
numbers. People are starting to experiment with 220° fresnel lens optics,
they'll have a distortion that's going to be really weird. I think you'll
want to have a device identifier so you always have a fallback you can hack
together manually for devices that couldn't spec themselves into
"distortion factors".


> Also, if this is going to be a high quality experience you'll want to be
> able to target rendering to the HMD directly and not rely on OS mirroring
> to render the image. This is a can of worms in and of itself: How do you
> reference the display? Can you manipulate a DOM tree on it, or is it
> limited to WebGL/Canvas2D? If you can render HTML there how do the
> appropriate distortions get applied, and how do things like depth get
> communicated? Does this new rendering surface share the same Javascript
> scope as the page that launched it? If the HMD refreshes at 90hz and your
> monitor refreshes at 60hz, when does requestAnimationFrame fire? These are
> not simple questions, and need to be considered carefully to make sure that
> any resulting API is useful.
>

The OS-mirror split-screen solution has a lot going for it.
Stereoscopic/multi-head rendering is pretty much a driver nightmare, and
it's slow as hell most of the time.


> Even if your code is rendering at a consistent 60hz that means you're
> seeing ~67ms of lag, which will result in a motion-sickness-inducing
> "swimming" effect where the world is constantly catching up to your head
> position. And that's not even taking into account the question of how well
> Javascript/WebGL can keep up with rendering two high resolution views of a
> moderately complex scene, something that even modern gaming PCs can
> struggle with.
>
I believe the lag to be substantially higher than 67ms in browsers.


> That's an awful lot of work for technology that, right now, does not have
> a large user base and for which the standards and conventions are still
> being defined. I think that you'll have a hard time drumming up support for
> such an API until the technology becomes a little more widespread.
>
The devkit 1 sold around 7000 units, over the campaign at a rate of about
one order every 3-4 minutes. The devkit 2 which only went on sale 7 days
ago, was announced to have sold 75'000 units the day before yesterday, so a
rate of around one every 5-6 seconds. I believe that's more devices than
google glass shifted in its entire multi-year history so far...


On Wed, Mar 26, 2014 at 7:18 PM, Brandon Jones <bajones@google.com> wrote:

> So there's a few things to consider regarding this. For one, I think your
> ViewEvent structure would need to look more like this:
>
> interface ViewEvent : UIEvent {
>     readonly attribute Quaternion orientation; // Where Quaternion is 4
> floats. Prevents gimble lock.
>     readonly attribute float offsetX; // offset X from the calibrated
> center 0 in millimeters
>     readonly attribute float offsetY; // offset Y from the calibrated
> center 0 in millimeters
>     readonly attribute float offsetZ; // offset Z from the calibrated
> center 0 in millimeters
>     readonly attribute float accelerationX; // Acceleration along X axis
> in m/s^2
>     readonly attribute float accelerationY; // Acceleration along Y axis
> in m/s^2
>     readonly attribute float accelerationZ; // Acceleration along Z axis
> in m/s^2
> }
>
> You have to deal with explicit units for a case like this and not
> clamped/normalized values. What would a normalized offset of 1.0 mean? Am I
> slightly off center? At the other end of the room? It's meaningless without
> a frame of reference. Same goes for acceleration. You can argue that you
> can normalize to 1.0 == 9.8 m/s^2 but the accelerometers will happily
> report values outside that range, and at that point you might as well just
> report in a standard unit.
>
> As for things like eye position and such, you'd want to query that
> separately (no sense in sending it with every device), along with other
> information about the device capabilities (Screen resolution, FOV, Lens
> distortion factors, etc, etc.) And you'll want to account for the scenario
> where there are more than one device connected to the browser.
>
> Also, if this is going to be a high quality experience you'll want to be
> able to target rendering to the HMD directly and not rely on OS mirroring
> to render the image. This is a can of worms in and of itself: How do you
> reference the display? Can you manipulate a DOM tree on it, or is it
> limited to WebGL/Canvas2D? If you can render HTML there how do the
> appropriate distortions get applied, and how do things like depth get
> communicated? Does this new rendering surface share the same Javascript
> scope as the page that launched it? If the HMD refreshes at 90hz and your
> monitor refreshes at 60hz, when does requestAnimationFrame fire? These
> are not simple questions, and need to be considered carefully to make sure
> that any resulting API is useful.
>
> Finally, it's worth considering that for a VR experience to be effective
> it needs to be pretty low latency. Put bluntly: Browser suck at this.
> Optimizing for scrolling large pages of flat content, text, and images is
> very different from optimizing for realtime, super low latency I/O. If you
> were to take an Oculus Rift and plug it into one of the existing
> browser/Rift demos <https://github.com/Instrument/oculus-bridge> with
> Chrome, you'll probably find that in the best case the rendering lags
> behind your head movement by about 4 frames. Even if your code is rendering
> at a consistent 60hz that means you're seeing ~67ms of lag, which will
> result in a motion-sickness-inducing "swimming" effect where the world is
> constantly catching up to your head position. And that's not even taking
> into account the question of how well Javascript/WebGL can keep up with
> rendering two high resolution views of a moderately complex scene,
> something that even modern gaming PCs can struggle with.
>
> That's an awful lot of work for technology that, right now, does not have
> a large user base and for which the standards and conventions are still
> being defined. I think that you'll have a hard time drumming up support for
> such an API until the technology becomes a little more widespread.
>
> (Disclaimer: I'm very enthusiastic about current VR research. If I sound
> negative it's because I'm being practical, not because I don't want to see
> this happen)
>
> --Brandon
>
>
> On Wed, Mar 26, 2014 at 12:34 AM, Brandon Andrews <
> warcraftthreeft@sbcglobal.net> wrote:
>
>> I searched, but I can't find anything relevant in the archives. Since
>> pointer lock is now well supported, I think it's time to begin thinking
>> about virtual reality APIs. Since this is a complex topic I think any spec
>> should start simple. With that I'm proposing we have a discussion on adding
>> a head tracking. This should be very generic with just position and
>> orientation information. So no matter if the data is coming from a webcam,
>> a VR headset, or a pair of glasses with eye tracking in the future the
>> interface would be the same. This event would be similar to mouse move with
>> a high sample rate (which is why in the event the head tracking and eye
>> tracking are in the same event representing a user's total view).
>>
>> interface ViewEvent : UIEvent {
>>     readonly attribute float roll; // radians, positive is slanting the
>> head to the right
>>     readonly attribute float pitch; // radians, positive is looking up
>>     readonly attribute float yaw; // radians, positive is looking to the
>> right
>>     readonly attribute float offsetX; // offset X from the calibrated
>> center 0 in the range -1 to 1
>>     readonly attribute float offsetY; // offset Y from the calibrated
>> center 0 in the range -1 to 1
>>     readonly attribute float offsetZ; // offset Z from the calibrated
>> center 0 in the range -1 to 1, and 0 if not supported
>>     readonly attribute float leftEyeX; // left eye X position in screen
>> coordinates from -1 to 1 (but not clamped) where 0 is the default if not
>> supported
>>     readonly attribute float leftEyeY; // left eye Y position in screen
>> coordinates from -1 to 1 (but not clamped) where 0 is the default if not
>> supported
>>     readonly attribute float rightEyeX; // right eye X position in screen
>> coordinates from -1 to 1 (but not clamped) where 0 is the default if not
>> supported
>>     readonly attribute float rightEyeY; // right eye Y position in screen
>> coordinates from -1 to 1 (but not clamped) where 0 is the default if not
>> supported
>> }
>>
>> Then like the pointer lock spec the user would be able to request view
>> lock to begin sampling head tracking data from the selected source. There
>> would thus be a view lock change event.
>> (It's not clear how the browser would list which sources to let the user
>> choose from. So if they had a webcam method that the browser offered and an
>> Oculus Rift then both would show and the user would need to choose).
>>
>> Now for discussion. Are there any features missing from the proposed head
>> tracking API or features that VR headsets offer that need to be included
>> from the beginning? Also I'm not sure what it should be called. I like
>> "view lock", but it was my first thought so "head tracking" or something
>> else might fit the scope of the problem better.
>>
>> Some justifications. The offset and head orientation are self explanatory
>> and calibrated by the device. The eye offsets would be more for a UI that
>> selects or highlights things as the user moves their eyes around. Examples
>> would be a web enabled HUD on VR glasses and a laptop with a precision
>> webcam. The user calibrates with their device software which reports the
>> range (-1, -1) to (1, 1) in screen space. The values are not clamped so the
>> user can look beyond the calibrated ranges. Separate left and right eye
>> values enable precision and versatility since most hardware supporting eye
>> tracking will have raw values for each eye.
>>
>>
>>
>
Received on Thursday, 27 March 2014 07:43:02 UTC