Browser Latency "50ms" Target Discussion from Mark Rejhon on 2018-03-26 (public-web-perf@w3.org from March 2018)

From: Mark Rejhon <mark@blurbusters.com>
Date: Mon, 26 Mar 2018 17:53:38 -0400
To: Todd Reifsteck <toddreif@microsoft.com>
Cc: Timothy Dresser <tdresser@chromium.org>, Ilya Grigorik <igrigorik@google.com>, "public-web-perf@w3.org" <public-web-perf@w3.org>
Message-ID: <CANDAP14G=rNf=KqdNv43YU9SbZC+9j=-J3ySiEV2kgFTjye0wg@mail.gmail.com>
Hello,
[Starting new thread, as a long-term technical "latency considerations of
50ms" thread]

On Mon, Mar 26, 2018 at 5:03 PM, Todd Reifsteck <toddreif@microsoft.com>
wrote:

> My translation of the feedback being given on this specification:
>
>
>    - Could a site enable this at < 50 ms when using this for high speed
>    animations/games on well-tuned sites?
>       - Yes, but it may trigger a lot more often than many web sites want
>       to measure. Perhaps this could default to 50 ms, but allow a site to opt in
>       to < 50 ms for scenarios such as what Mark is describing?
>    - How do we measure the duration between the event updating the DOM
>    and the display actually showing it?
>       - Tim?
>
> It's not directly related, except in an "input lag chain" problem:

The whole input lag chain is a complex beast (source-side such as input
reads, and destination-side such as display).

There are TVs with more than 50ms input lag, so* 50ms+50ms = 100ms *of
input lag.  That, obviously, becomes horrible...

I'm just wanting to add that it's often hugely desirable to greatly reduce
to much less than 50ms on the browser side, to compensate for a very laggy
display.

It's extremely bad when this happens when you combine slow input & slow
display -- they are additive.  (And I haven't gone into middlemen stuff,
such as extra buffer queues in certain graphics drivers...)

Some TVs and monitors have really low input lag (less than 10ms, being
real-time scanout).  So we have to consider that the 50ms target should
have some flex depending on vendor.

For an AC-powered plugged-in XBox, PlayStation, Home Theater Computer, or
browser-powered settop box or SmartTV browser firmware -- or when using
laggier computer monitors -- it is not always ideal to go to
lowest-common-denominator of 50ms since there are no real power-management
considerations if it costs only one or two watts extra to reduce lag by
more than 50%.

Also, by videogame players -- sometimes "throwing away frames" to reduce
50ms lag to 10ms lag (in a "NVIDIA Fast Sync" style implementation) is
sometimes a legitimate software developer goal.  Not for a phone browser,
but for an AC powered box, it can be a useful consideration.

So in this situation it's a tradeoff: "Do I want to consume extra power in
order to save a lot of lag, because I'm an AC powered box connected to a
potentially laggy consumer display?".  In this case, yes, let's definitely
throw away frames: It's worth it if it reduces lag by a huge amount,
acording to our graphs:
https://www.blurbusters.com/wp-content/uploads/2017/06/blur-busters-gsync-101-vsync-off-w-fps-limits-60Hz.png
(From Page 9 of our GSYNC 101 tests).   Yeah, on the surface, it seems
wasteful from a power consumption perspective, but if we're an AC powered
box connected to a laggy television set, and if it only consumes briefly a
tiny bit more power only during the keypress/mousemove/joypad then why
not?   46ms lag at 60Hz without throwing away frames, but that went way
down to less than 30ms if I began "throwing away frames" in a
latency-versus-waste tradeoff -- that's what this graph shows.  There are
14 pages containing 49 graphs in the GSYNC101, but I highlight this one
because it's a 60 Hertz test and demo of the more-lag-vs-more-waste
"effect" that happens.

Also, since PointerEvents API runs independently of requestAnimationFrame()
so it can simply bookmark only the freshest mouse pointer or touchscreen
input.  So a well-written Javascript app, can just simply record input
separately without rendering, then inside the animation thread
(requestAnimationFrame), it uses the freshest most-recently remembered
input instead.   The coalescing stuff happening now adds input lag, but it
does have the advantage of forcing power savings.

On the other hand, there are many other tricks to reduce input lag with no
extra power consumption, and no extra frame rate:

(A) Just-in-time rendering.
Refresh cycles at 60Hz are 16.7ms.  If you know your rendering takes only
2ms-3ms (fast computers + fast compositing), you can delay your rendering
until roughly 4-5ms before the next refresh cycle.  That can reduce input
lag by quite a lot, almost 1 refresh cycle, since most input reads often
occur immediately right at the VSYNC time interval but the frame is
buffered for the NEXT refresh cycle after the currently one being displayed
(that you rendered a refresh cycle ago but was forced to wait for VSYNC).
Some games use this technique to reduce input lag by more than 10ms.
Unfortunately, it also adds quite a lot of jittering/microstutter if you
have rendering surges (e.g. a graphics that took 5ms to render).   So it's
a double-edged sword.

(B) Software-triggered refresh cycles via the use of a variable refresh
display:
FreeSync, VESA Adaptive-Sync, HDMI VRR, G-SYNC
Variable refresh rate displays avoid buffering, and their refresh cycles
automatically begins at the instant of a Direct3D Present() or OpenGL
glutSwapBuffers().  So they avoid the mandatory latency of a
fixed-refresh-cycle workflow, since the display actually waits for your
signal to begin refreshing.  So that shortens time between input read &
refresh cycle display.  Instead of software forced to wait for next display
refresh cycle, the display is actually asynchronous; it actually waits for
you.   (For oldtimer CRT signal nuts, the display is "held" in VBLANK, and
Direct3D Present() causes the first scanline to *immediately* begin
scanning out of the video output -- right on the spot --
software-triggered.   It's essentially a variable-thickness blanking
interval, as a spacer between refresh cycles).

(C) Extra refresh rate
If you have ever swiped on a 120Hz iPad, then you can clearly see the
benefits of a 120Hz refresh rate.  But the Safari browser doesn't take
advantage of that in javascript animations (just smoother for scrolling,
not for javascript animations -- e.g. www.testufo.com runs at only 60
frames per second on a 120Hz iPad).  This means there's a little more input
lag for animations.   But to Apple's credit, they chose a low-latency LCD
display for their tablets, so they have a priority to save battery (lower
power consumption) so that is probably their basis of limiting JavaScript
logic to 60Hz, adding slight input lag to HTML games.  This is not
noticeable for most users, but I can certainly (as an annoyed gamer) notice
when something is being artificially limited to 60Hz, and the increase in
latency from an artificial framerate limit.  But power management is
extremely important, and the lag of the iPad browser is "good enough".



*> Having the point data at much higher precision than 1/frame should allow
all movement data to be used when processing. If the screen is only updated
at 60 Hz, any updates to the actual graphics occurring more often than 60
Hz can often cause “double resource usage” when animations are thrown away
multiple times a frame*

Many game developers do many techniques to reduce lag for these situations:
- Postpone rendering until the very last minute before a refresh cycle.
*Instead of traditionally beginning to render a new frame right after
returning from a VSYNC, hold off *


Long term, I'd like to see the browser wold adopt a permission mechanism
for "This page would like High Performance Mode.  Your system power
consumption may increase dramatically and shorten battery life.   Allow /
Deny?"

Clicking Allow would allow atomic events and ultralow-lag mode:
-- non-coalesced realtime events at 1000Hz at 1000 separate events per
second
-- virtual reality optimized mode
-- lowest latency at all costs in an "every single millisecond matters"
perspective.
-- Enable full precision (unfiltered) timers at 0.1us precision without
filtering algorithms (at leasto Spectre/Meltdown-proof systems).
-- animation rate always running at full frame rates if performance allows
-- Optional unthrottled animations (similar to chrome --disable-gpu-vsync
...) via a JavaScript flag to enable-vsync-off mode that enables javascript
WebGL/animations/etc at framerates higher than refresh rates, as much as
CPU/GPU performance allows.  The VSYNC OFF mode also bypasses graphics
driver buffering lag, so the uncapped/unsynchronized framerate mode can
actually be lower lag even at 59fps-61fps unsynchronized than 60fps
synchronized, *since the process of synchronization actually adds lag
itself.*  And  the unsynchronized has benefits of letting higher framerates
if performance allows.  Also, Related Blur Busters articles:
https://www.blurbusters.com/faq/benefits-of-frame-rate-above-refresh-rate/
and http://www.blurbusters.com/human-reflex ...)
-- Etc.

Also, some seasoned competition gamers easily tell apart 15ms versus 25ms
latency -- and that is button-to-pixels (not just software stage alone).

Also.... VR scientist John Carmack at Oculus confirms sub-10ms is needed
for VR.  This would be a high-performance mode specific to full-screen
WebGL 3D games and WebVR graphics, as well as other applications that has
much higher latency demands.  VR requires sub-10ms latnecy from photon to
pixels -- so that means input processing and javascript processing in less
than 5ms because other things (e.g. display processing, USB cable,
DisplayPort cable) have their own latency -- 5ms is obviously not possible
yet -- but it outlines 50ms is a volkswagen beetle for VR applications
sometimes.  :)

50ms is perfectly fine for recreational web browsing on already-low-lag
computer monitors & low-lag tablet displays, but won't be perfect on slow
monitors & slow televisions.   (I've seen a slow DELL computer monitor that
added 25ms of lag for example, so monitors are not immune, alas)

Just a thoughtful considerations... for long-term standardization thinking.

Thanks,
Mark Rejhon
Founder, Blur Busters / TestUFO


With regard to the general statements made:
>
> Having the point data at much higher precision than 1/frame should allow
> all movement data to be used when processing. If the screen is only updated
> at 60 Hz, any updates to the actual graphics occurring more often than 60
> Hz can often cause “double resource usage” when animations are thrown away
> multiple times a frame. The purpose of allowing input callbacks to receive
> all input data 1/paint loop is to allow the “best of both worlds” by
> allowing the site to ensure the single paint uses all input data when
> calculating the UI updates. If this is not true in real world usage, I
> think the input and rendering teams would be interested to hear exactly how
> BUT that is not the focus of this specification. (Lets please start a new
> thread if that is a topic we’d like to discuss.)
>
>
>
> Hope that helps!
>
> Todd
>
>
>
> *From:* blurbusters@gmail.com <blurbusters@gmail.com> * On Behalf Of *Mark
> Rejhon
> *Sent:* Monday, March 26, 2018 1:47 PM
> *To:* Timothy Dresser <tdresser@chromium.org>
> *Cc:* Ilya Grigorik <igrigorik@google.com>; public-web-perf@w3.org
> *Subject:* Re: Minimal event timing proposal
>
>
>
> Regarding First Input Delay, we do lots of input latency measurements as
> Blur Busters, albiet from other points of views (e.g. gaming latency).
> Even 5ms makes a big difference in certain contexts.  Over the long term,
> 1000Hz mice should be streamed directly in an atomic manner to Javascript
> -- instead of always permanently only coalesced into PointerEvents API.
>
>
>
> Microsoft Resarch has found that realtime processing of 1000Hz input make
> a huge difference:
>
> https://www.youtube.com/watch?v=vOvQCPLkPt4
>
>
>
> Even on a 60Hz display it can still have large benefits.
>
>
>
> That said, understandably, while battery power can be a concern, Safari
> has been limiting lots of processing to 60Hz even on the new 120Hz iPads,
> certainly should be fleshed out better in all of this standardization
> work.  Many things like requestAnimationFrame() in Safari only runs at
> 60fps even on the 120Hz iPads, even though Section 7.1.4.2 of HTML 5.2
> recommends it running at full frame rates on higher-Hz displays (like in
> other browsers, Chrome, FireFox, Opera, and even some versions of Edge).
>  These add unwanted input lag to touchscreen events on Safari.   These are
> major input lag considerations, adding +8ms of input lag to browser apps
> rendered in <canvas> by running rAF() at 60fps instead of 120fps on the
> 120Hz iPads.   The display-side equation is part of the lag arithmetic too,
> even though there are also power-consumption-versus-lag tradeoffs.
>
>
>
>
>
> On Mon, Mar 26, 2018 at 4:26 PM, Timothy Dresser <tdresser@chromium.org>
> wrote:
>
> I've updated the proposal on WICG here
> <https://github.com/WICG/event-timing/blob/master/README.md>.
> This proposal requires only dispatching PerformanceEventTiming entries if
> the duration between event startTime and when event processing is finished
> is > 50ms.
>
>
>
> Tim
>
> On Fri, Feb 16, 2018 at 1:43 PM Ilya Grigorik <igrigorik@google.com>
> wrote:
>
> Tim, thanks for drafting this! I like where this is headed.
>
>
>
> Left a few questions in the doc and added this to our agenda for next
> design call
> <https://docs.google.com/document/d/10dz_7QM5XCNsGeI63R864lF9gFqlqQD37B4q8Q46LMM/edit#heading=h.ljsbtcikd3cl> (date
> TBD).
>
>
>
> On Thu, Feb 15, 2018 at 1:53 PM, Timothy Dresser <tdresser@chromium.org>
> wrote:
>
> Based on our discussion on First Input Delay
> <https://docs.google.com/document/d/1Tnobrn4I8ObzreIztfah_BYnDkbx3_ZfJV5gj2nrYnY/edit> at
> the last WG meeting, I've put together a minimal proposal
> <https://docs.google.com/document/d/10CdRCrUQzQF1sk8uHmhEPG7F_jcZ2S3l9Zm40lp3qYk/edit#heading=h.fbdd8nwxr7v4> for
> an event timing API.
>
>
>
> The extensions to the DOM spec are fairly straight forward, and the API
> itself is pretty bare bones. The main question is whether or not
> dispatching an entry per DOM event is too expensive.
>
> If it is, we'll need to devise a method to only report a subset of events.
>
> I'd appreciate any feedback you have,
> Tim
>
>
>
>
>
Received on Monday, 26 March 2018 21:54:25 UTC