Largest Contenful Paint from Gilles Dubuc on 2019-04-12 (public-web-perf@w3.org from April 2019)

From: Gilles Dubuc <gilles@wikimedia.org>
Date: Fri, 12 Apr 2019 22:24:58 +0200
To: public-web-perf <public-web-perf@w3.org>
Message-ID: <CALac36Ux0=Ge4bGeON_KS8VKmp4r_FLaXMimV5=iYro-xi2x1g@mail.gmail.com>
Some extra (subjective!) feedback on today's presentation. First of all, I
didn't convey that during the call, but thanks for making yet another
attempt at creating a metric that gets closer to the user experience. I
point out the negatives I see, but I'm really happy to see that you're not
giving up on that quest.

Since I don't want to be the person that only points out issues and offers
no solutions... taking the concept as-is, I think a possible fix might be
to ignore user interaction. But that might pose challenges for the browser
with keeping track of things that are outside of the viewport after a user
scrolls. The API could then signify which portion of the "originally"
largest element before scroll is still visible at the time it's fully
loaded. You could also have 2 elements reported in that case: one that was
the biggest at the time the user scrolled away, and another that's the one
that would have been the biggest if they hadn't scrolled away.

Looking at the proposal without changes, I think the main weakness of this
metric is precisely that it tries to model the user psychology beyond
making a simple building block. I consider most, if not all, existing
performance APIs to surface simple building blocks that can be reused and
composed in different ways. Their usefulness usually goes beyond
performance. Making something that has a lot of rules inside of it,
blacklisting special cases, will on the other hand take us away from a
"building block" quality and into something that has to be taken as a
whole. You can't really do much with it besides considering it a
performance score. Because it includes so many special cases that you can't
derive composable meaning from it. That would be fine if we were getting
closer to the holy grail and actually getting a metric that provably
correlated better to what real users feel.

But the problem is that this seems to be being designed without end user
(web visitor) input. From a logical perspective, you can look at the
description and think "yes, that seems like something users would care
about". But have you asked them if they do care about it? Do they care
about these aspects of the page load combined this way more than things we
can already capture? Maybe they care more about completely different
aspects of the user experience that are a complete blind spot at the moment?

If the goal is to please developers with something that developers will
think is useful (users still not involved), then yes, I think it reaches
that goal. It makes sense from the point of view of an engineer or product
manager's mindset. Analytics providers can make customers happy by adding
the latest and greatest novelty. But it's a disappointment to me if that's
all we're aiming for.

In research I've done that will be presented/published next month at The
Web Conference <https://www2019.thewebconf.org/> (I can share the paper
privately with anyone who's interested) I saw that all existing performance
metrics correlate pretty poorly with user opinion about how fast the page
is. We asked users. I think you should too, when coming up with new metrics
like this.

I'm afraid that if we keep looking in new paint timings in the very early
page load timeframe, we won't get metrics that correlate any better to user
opinion. I have a lot of digging to do into our Element Timing for Images
data in the next couple of months to answer that very question about that
other API (we're still asking our users about their performance
perception), but I will be able to do that. It would be nice, in my
opinion, if the user was involved very early in the metric design. The
status quo is that we can only verify that much further the process, once a
form of that metric is already fully implemented in a browser. And maybe
the early design choices were so disconnected from the user perception that
in the end we're not getting something more valuable than existing cruder
metrics.

We might be wasting time and effort cutting this small part of the user
experience (above-the-fold timings in the early loading of the page) into
thinner slices that could very possibly not be any closer to user perceived
performance than existing metrics.

I'd like to see research showing that users care about this particular
slice of the user experience, to gain more confidence that this is actually
better than something like FCP. I think that the resulting metric would be
more attractive to developers if you could show something like X% of users
in the study were happier with the performance when that particular metric
was lower, all other things being equal. Compared with Y% were happier when
FCP was lower, all other things being equal. That would demonstrate that
the metric is measuring something users really perceive that's of higher
importance than existing metrics.
Received on Friday, 12 April 2019 20:25:38 UTC