Re: Largest Contenful Paint

Thanks for the feedback, and sorry for the delay.

I'd love to see that paper!

We have done studies correlating metrics with user opinions of speed, but
I'm a bit suspicious of them. User reported speed may not correlate well
with the impact of speed on a user, but verifying this is tricky. I think
user behavior is a better indicator of performance's impact on user
experience. This is hard to evaluate though. We've done some of this
research via ablation studies, where we regress browser performance and
look at the impact on user behavior, but I don't see a good way to apply
that technique to evaluating metrics. We can look for correlation between
LCP and attributes of user behavior like abort rate, but it's tricky to
separate correlation from causation with that approach.

Research based on user reported speed is definitely better than nothing
though. I'm working to figure out what we can share of our analyses in this
space.

Re: LCP not being a building block: from my perspective, Element Timing is
the building block, and LCP aims to be the prefabricated solution. It won't
be as good as what you can build yourself via Element Timing, but it should
provide significantly more value than FCP.

Re: User input: we definitely need to put some more thought in here. Thanks
for your ideas!

Tim

On Fri, Apr 12, 2019 at 4:26 PM Gilles Dubuc <gilles@wikimedia.org> wrote:

> Some extra (subjective!) feedback on today's presentation. First of all, I
> didn't convey that during the call, but thanks for making yet another
> attempt at creating a metric that gets closer to the user experience. I
> point out the negatives I see, but I'm really happy to see that you're not
> giving up on that quest.
>
> Since I don't want to be the person that only points out issues and offers
> no solutions... taking the concept as-is, I think a possible fix might be
> to ignore user interaction. But that might pose challenges for the browser
> with keeping track of things that are outside of the viewport after a user
> scrolls. The API could then signify which portion of the "originally"
> largest element before scroll is still visible at the time it's fully
> loaded. You could also have 2 elements reported in that case: one that was
> the biggest at the time the user scrolled away, and another that's the one
> that would have been the biggest if they hadn't scrolled away.
>
> Looking at the proposal without changes, I think the main weakness of this
> metric is precisely that it tries to model the user psychology beyond
> making a simple building block. I consider most, if not all, existing
> performance APIs to surface simple building blocks that can be reused and
> composed in different ways. Their usefulness usually goes beyond
> performance. Making something that has a lot of rules inside of it,
> blacklisting special cases, will on the other hand take us away from a
> "building block" quality and into something that has to be taken as a
> whole. You can't really do much with it besides considering it a
> performance score. Because it includes so many special cases that you can't
> derive composable meaning from it. That would be fine if we were getting
> closer to the holy grail and actually getting a metric that provably
> correlated better to what real users feel.
>
> But the problem is that this seems to be being designed without end user
> (web visitor) input. From a logical perspective, you can look at the
> description and think "yes, that seems like something users would care
> about". But have you asked them if they do care about it? Do they care
> about these aspects of the page load combined this way more than things we
> can already capture? Maybe they care more about completely different
> aspects of the user experience that are a complete blind spot at the moment?
>
> If the goal is to please developers with something that developers will
> think is useful (users still not involved), then yes, I think it reaches
> that goal. It makes sense from the point of view of an engineer or product
> manager's mindset. Analytics providers can make customers happy by adding
> the latest and greatest novelty. But it's a disappointment to me if that's
> all we're aiming for.
>
> In research I've done that will be presented/published next month at The
> Web Conference <https://www2019.thewebconf.org/> (I can share the paper
> privately with anyone who's interested) I saw that all existing performance
> metrics correlate pretty poorly with user opinion about how fast the page
> is. We asked users. I think you should too, when coming up with new metrics
> like this.
>
> I'm afraid that if we keep looking in new paint timings in the very early
> page load timeframe, we won't get metrics that correlate any better to user
> opinion. I have a lot of digging to do into our Element Timing for Images
> data in the next couple of months to answer that very question about that
> other API (we're still asking our users about their performance
> perception), but I will be able to do that. It would be nice, in my
> opinion, if the user was involved very early in the metric design. The
> status quo is that we can only verify that much further the process, once a
> form of that metric is already fully implemented in a browser. And maybe
> the early design choices were so disconnected from the user perception that
> in the end we're not getting something more valuable than existing cruder
> metrics.
>
> We might be wasting time and effort cutting this small part of the user
> experience (above-the-fold timings in the early loading of the page) into
> thinner slices that could very possibly not be any closer to user perceived
> performance than existing metrics.
>
> I'd like to see research showing that users care about this particular
> slice of the user experience, to gain more confidence that this is actually
> better than something like FCP. I think that the resulting metric would be
> more attractive to developers if you could show something like X% of users
> in the study were happier with the performance when that particular metric
> was lower, all other things being equal. Compared with Y% were happier when
> FCP was lower, all other things being equal. That would demonstrate that
> the metric is measuring something users really perceive that's of higher
> importance than existing metrics.
>

Received on Wednesday, 17 April 2019 19:40:07 UTC