- From: Florian Bösch <pyalot@gmail.com>
- Date: Thu, 27 Mar 2014 08:42:34 +0100
- To: Brandon Jones <bajones@google.com>
- Cc: Brandon Andrews <warcraftthreeft@sbcglobal.net>, "public-webapps@w3.org" <public-webapps@w3.org>
- Message-ID: <CAOK8ODgXBcttd11bNJPJbaafoVmgq=7eW1dw4kFe0t6AFg799w@mail.gmail.com>
Replied to Brandon Jones but not public, reposted below. On Wed, Mar 26, 2014 at 7:18 PM, Brandon Jones <bajones@google.com> wrote: > > As for things like eye position and such, you'd want to query that > separately (no sense in sending it with every device), along with other > information about the device capabilities (Screen resolution, FOV, Lens > distortion factors, etc, etc.) And you'll want to account for the scenario > where there are more than one device connected to the browser. > There isn't an easy way to describe the FOV and lens distortion in a few numbers. People are starting to experiment with 220° fresnel lens optics, they'll have a distortion that's going to be really weird. I think you'll want to have a device identifier so you always have a fallback you can hack together manually for devices that couldn't spec themselves into "distortion factors". > Also, if this is going to be a high quality experience you'll want to be > able to target rendering to the HMD directly and not rely on OS mirroring > to render the image. This is a can of worms in and of itself: How do you > reference the display? Can you manipulate a DOM tree on it, or is it > limited to WebGL/Canvas2D? If you can render HTML there how do the > appropriate distortions get applied, and how do things like depth get > communicated? Does this new rendering surface share the same Javascript > scope as the page that launched it? If the HMD refreshes at 90hz and your > monitor refreshes at 60hz, when does requestAnimationFrame fire? These are > not simple questions, and need to be considered carefully to make sure that > any resulting API is useful. > The OS-mirror split-screen solution has a lot going for it. Stereoscopic/multi-head rendering is pretty much a driver nightmare, and it's slow as hell most of the time. > Even if your code is rendering at a consistent 60hz that means you're > seeing ~67ms of lag, which will result in a motion-sickness-inducing > "swimming" effect where the world is constantly catching up to your head > position. And that's not even taking into account the question of how well > Javascript/WebGL can keep up with rendering two high resolution views of a > moderately complex scene, something that even modern gaming PCs can > struggle with. > I believe the lag to be substantially higher than 67ms in browsers. > That's an awful lot of work for technology that, right now, does not have > a large user base and for which the standards and conventions are still > being defined. I think that you'll have a hard time drumming up support for > such an API until the technology becomes a little more widespread. > The devkit 1 sold around 7000 units, over the campaign at a rate of about one order every 3-4 minutes. The devkit 2 which only went on sale 7 days ago, was announced to have sold 75'000 units the day before yesterday, so a rate of around one every 5-6 seconds. I believe that's more devices than google glass shifted in its entire multi-year history so far... On Wed, Mar 26, 2014 at 7:18 PM, Brandon Jones <bajones@google.com> wrote: > So there's a few things to consider regarding this. For one, I think your > ViewEvent structure would need to look more like this: > > interface ViewEvent : UIEvent { > readonly attribute Quaternion orientation; // Where Quaternion is 4 > floats. Prevents gimble lock. > readonly attribute float offsetX; // offset X from the calibrated > center 0 in millimeters > readonly attribute float offsetY; // offset Y from the calibrated > center 0 in millimeters > readonly attribute float offsetZ; // offset Z from the calibrated > center 0 in millimeters > readonly attribute float accelerationX; // Acceleration along X axis > in m/s^2 > readonly attribute float accelerationY; // Acceleration along Y axis > in m/s^2 > readonly attribute float accelerationZ; // Acceleration along Z axis > in m/s^2 > } > > You have to deal with explicit units for a case like this and not > clamped/normalized values. What would a normalized offset of 1.0 mean? Am I > slightly off center? At the other end of the room? It's meaningless without > a frame of reference. Same goes for acceleration. You can argue that you > can normalize to 1.0 == 9.8 m/s^2 but the accelerometers will happily > report values outside that range, and at that point you might as well just > report in a standard unit. > > As for things like eye position and such, you'd want to query that > separately (no sense in sending it with every device), along with other > information about the device capabilities (Screen resolution, FOV, Lens > distortion factors, etc, etc.) And you'll want to account for the scenario > where there are more than one device connected to the browser. > > Also, if this is going to be a high quality experience you'll want to be > able to target rendering to the HMD directly and not rely on OS mirroring > to render the image. This is a can of worms in and of itself: How do you > reference the display? Can you manipulate a DOM tree on it, or is it > limited to WebGL/Canvas2D? If you can render HTML there how do the > appropriate distortions get applied, and how do things like depth get > communicated? Does this new rendering surface share the same Javascript > scope as the page that launched it? If the HMD refreshes at 90hz and your > monitor refreshes at 60hz, when does requestAnimationFrame fire? These > are not simple questions, and need to be considered carefully to make sure > that any resulting API is useful. > > Finally, it's worth considering that for a VR experience to be effective > it needs to be pretty low latency. Put bluntly: Browser suck at this. > Optimizing for scrolling large pages of flat content, text, and images is > very different from optimizing for realtime, super low latency I/O. If you > were to take an Oculus Rift and plug it into one of the existing > browser/Rift demos <https://github.com/Instrument/oculus-bridge> with > Chrome, you'll probably find that in the best case the rendering lags > behind your head movement by about 4 frames. Even if your code is rendering > at a consistent 60hz that means you're seeing ~67ms of lag, which will > result in a motion-sickness-inducing "swimming" effect where the world is > constantly catching up to your head position. And that's not even taking > into account the question of how well Javascript/WebGL can keep up with > rendering two high resolution views of a moderately complex scene, > something that even modern gaming PCs can struggle with. > > That's an awful lot of work for technology that, right now, does not have > a large user base and for which the standards and conventions are still > being defined. I think that you'll have a hard time drumming up support for > such an API until the technology becomes a little more widespread. > > (Disclaimer: I'm very enthusiastic about current VR research. If I sound > negative it's because I'm being practical, not because I don't want to see > this happen) > > --Brandon > > > On Wed, Mar 26, 2014 at 12:34 AM, Brandon Andrews < > warcraftthreeft@sbcglobal.net> wrote: > >> I searched, but I can't find anything relevant in the archives. Since >> pointer lock is now well supported, I think it's time to begin thinking >> about virtual reality APIs. Since this is a complex topic I think any spec >> should start simple. With that I'm proposing we have a discussion on adding >> a head tracking. This should be very generic with just position and >> orientation information. So no matter if the data is coming from a webcam, >> a VR headset, or a pair of glasses with eye tracking in the future the >> interface would be the same. This event would be similar to mouse move with >> a high sample rate (which is why in the event the head tracking and eye >> tracking are in the same event representing a user's total view). >> >> interface ViewEvent : UIEvent { >> readonly attribute float roll; // radians, positive is slanting the >> head to the right >> readonly attribute float pitch; // radians, positive is looking up >> readonly attribute float yaw; // radians, positive is looking to the >> right >> readonly attribute float offsetX; // offset X from the calibrated >> center 0 in the range -1 to 1 >> readonly attribute float offsetY; // offset Y from the calibrated >> center 0 in the range -1 to 1 >> readonly attribute float offsetZ; // offset Z from the calibrated >> center 0 in the range -1 to 1, and 0 if not supported >> readonly attribute float leftEyeX; // left eye X position in screen >> coordinates from -1 to 1 (but not clamped) where 0 is the default if not >> supported >> readonly attribute float leftEyeY; // left eye Y position in screen >> coordinates from -1 to 1 (but not clamped) where 0 is the default if not >> supported >> readonly attribute float rightEyeX; // right eye X position in screen >> coordinates from -1 to 1 (but not clamped) where 0 is the default if not >> supported >> readonly attribute float rightEyeY; // right eye Y position in screen >> coordinates from -1 to 1 (but not clamped) where 0 is the default if not >> supported >> } >> >> Then like the pointer lock spec the user would be able to request view >> lock to begin sampling head tracking data from the selected source. There >> would thus be a view lock change event. >> (It's not clear how the browser would list which sources to let the user >> choose from. So if they had a webcam method that the browser offered and an >> Oculus Rift then both would show and the user would need to choose). >> >> Now for discussion. Are there any features missing from the proposed head >> tracking API or features that VR headsets offer that need to be included >> from the beginning? Also I'm not sure what it should be called. I like >> "view lock", but it was my first thought so "head tracking" or something >> else might fit the scope of the problem better. >> >> Some justifications. The offset and head orientation are self explanatory >> and calibrated by the device. The eye offsets would be more for a UI that >> selects or highlights things as the user moves their eyes around. Examples >> would be a web enabled HUD on VR glasses and a laptop with a precision >> webcam. The user calibrates with their device software which reports the >> range (-1, -1) to (1, 1) in screen space. The values are not clamped so the >> user can look beyond the calibrated ranges. Separate left and right eye >> values enable precision and versatility since most hardware supporting eye >> tracking will have raw values for each eye. >> >> >> >
Received on Thursday, 27 March 2014 07:43:02 UTC