回复：WG design call - April 12th @ 11am PST from 杨森(双扬) on 2019-04-25 (public-web-perf@w3.org from April 2019)

From: 杨森(双扬) <shuangyang.ys@alibaba-inc.com>
Date: Thu, 25 Apr 2019 10:48:52 +0800
To: "Fred Short" <fshort3@gmail.com>, "Philip Tellis" <ptellis@soasta.com>
Cc: "Tim Dresser" <tdresser@google.com>, "NicolÃ¡s PeÃ±a" <npm@google.com>, "Nic Jansma" <nic@nicj.net>, "Yoav Weiss" <yoav@yoav.ws>, "public-web-perf" <public-web-perf@w3.org>
Message-ID: <9de95df1-5429-4051-b131-cbc6e1841df0.shuangyang.ys@alibaba-inc.com>

I have built a few RUM products in Alibaba Group & Ant Financial Group, when it comes to enterprise level performance measuring and data analysis, I can really vouch for Philip that our users seldomly understand what is being measured and how to interpret the figures, they just want to know is it fast or slow.

So I believe it's important to set up a mindset for those non-professionals to understand how fast does the webpage load even it's not entirely accurate, and a dedicated metric for that couldn't be more essential. `load` is well-known but far from accurate for SPAs, FP/FCP is a good start but have their own limits, FMP is kind of alleivating this problem but not officially supported.

In our own RUM products I always use `load` as a base metric and other metrics for comparing, in the future, the base metric could be LCP, and that's the mindset I'm talking about.

------------------------------------------------------------------
发件人：Philip Tellis <ptellis@soasta.com>
发送时间：2019年4月25日(星期四) 08:44
收件人：Fred Short <fshort3@gmail.com>
抄 送：Tim Dresser <tdresser@google.com>; "NicolÃ¡s PeÃ±a" <npm@google.com>; Nic Jansma <nic@nicj.net>; Yoav Weiss <yoav@yoav.ws>; public-web-perf <public-web-perf@w3.org>
主 题：Re: WG design call - April 12th @ 11am PST

I think your experience matches ours. Most web developers either don't know that there exist methods to instrument their own timings, or aren't even thinking about measuring their apps (in these cases RUM tools come in through the marketing team or SEO team).
On Wed, Apr 24, 2019 at 5:12 PM Fred Short <fshort3@gmail.com> wrote:
While I agree that RUM “product” users generally want the instrumentation to just “figure it out” for them, I think in the LCP use-case, what is most relevant for your page/app is going to vary greatly, therefore trying to devise a generic way to represent that (heuristics, assumptions about image ‘importance’) isn’t going to result in something generally meaningful for that user in my view.

For example, with the class of apps I’ve worked on, images are generally unimportant and the “biggest” images are part of the company banner. I don’t care when that is rendered TBH. In my app, the most meaningful element could be the drop-down that contains the list of clients since that drives everything else displayed on the page. I *definitely* want to know when that is first rendered since that indicates when the user first starts seeing content meaningful to them and could reasonably start interacting with the app.

Maybe there is a default heuristic that could be used out of the box but for this to be a useful metric, I feel it needs to be something the dev’s could configure if the default doesn’t make sense for their app.

In terms of RUM users adding customizations, an example of this is UserTimings and is something we use a lot to further instrument specific parts of our app code. A big part of this is awareness and I can’t tell you how many “web developers” I’ve come across that aren’t even aware these timings exist, let alone want to add their own custom instrumentation.

On Apr 24, 2019, at 11:23 AM, Philip Tellis <ptellis@soasta.com> wrote:
Nic can probably add more details, but yes, in our experience, very few users of RUM add custom instrumentation. Most site owners want the RUM product to just "figure it out" for them, which is why we've ended up writing our own logic to detect SPA start/end that goes beyond what frameworks provide. We provide hooks to specify hero images, but only a few customers use it. It's often easier to find out what the de-facto CSS classes in use are and just build that into our detection logic.
On Wed, Apr 24, 2019 at 10:51 AM <tdresser@google.com> wrote:
"if they are interested in measuring the user experience of their page, they will take the time to do the necessary annotation. "

I don't have concrete data on this, but anecdotally that doesn't appear to be the case.
Most pages use RUM analytics to gather data about their pages, but don't do any custom instrumentation. Perhaps +Nic Jansma would have some data on this?

"Nicolás Peña" wrote:
ElementTiming will support images and text. Not sure if you consider that to cover forms, but I think these two are the building blocks of any website.
On Wed, Apr 24, 2019 at 9:52 AM Fred Short <fshort3@gmail.com> wrote:
Hi,
I initially added this to the video comment section but sending to the public-web-perf list as well:

"Interesting conversation but I think Ryosuke largely has it right. I think element size as a proxy for importance to the user won't ultimately be useful or relevant since what is important will vary greatly from page to page (i.e. what is meaningful for a static web page will be completely different to what is meaningful for a business app). Trying to do something that "loosely models user experience" will result in something that is largely not meaningful to most. The annotation approach via ElementTiming is the better approach and will be more useful/accurate in the long run. The argument that dev's won't take the time to annotate isn't a good one - if they are interested in measuring the user experience of their page, they will take the time to do the necessary annotation. A default, heuristic-based approach, won't provide much useful information relative to their page so they will need to fall back to ElementTiming to get something more meaningful for their page. One comment about the ElementTiming spec - If I'm reading this correctly, this is initially limited to annotating images within the page. This will need to be expanded to include other elements (i.e. form elements for starters) before this is going to be truly useful for business applications in the wild.”

Thanks,
Fred

On Apr 22, 2019, at 4:45 PM, Yoav Weiss <yoav@yoav.ws> wrote:
(Just realized this email was never sent after the April 12th call. Apologies!!)

Hey all,

Minutes and video from the call are now available. Copying them here for safe keeping.

Cheers,
Yoav

WebPerfWG design meeting - April 12th 2019
Participants:
Steven Bougon, Gilles Dubuc, Phillippe Le Hegaret, Nicolás Pena, Maxime Villancourt, Tim Dresser, Benjamin De Kosnik, Ryosuke Niwa, Andrew Comminos, Markus Stange, Nic Jansma, Todd Reifsteck, Yoav Weiss
Face to Face:
We need to notify everyone 8 weeks in advance.
No date nailed down yet.
Web Games workshop at the end of June
Next call dates:
Tuesday April 23th @ 9:00am PST
Largest Contentful Paint: slides (Nicolás)
Improves on FCP
Looks at the largest text or image.
Stop looking at input or page unload.
Heuristics:
Ignore removed elements.
Ignore background images attached to the body.
Mobile:
Most mobile pages have at least one large image.
Desktop:
May have more page with large text elements.
Ryosuke: why is this important?
Tim: We have had lots of feedback that FCP isn’t sufficient. Element timing is strictly better than LCP, but not everyone will manually annotate for element timing.
Ryosuke: It seems like this is just heuristics. If developers don’t take the time to annotate their websites for ET, why would they care about this?
Yoav: This isn’t just a bunch of heuristics, we’re trying to find the largest element, under the assumption that it is meaningful for the user. There are heusristics related to size of text elements, meaningfulness of BG images, but eventually, we want to use size as a proxy for importance.
Todd: I think the question was “are people asking for this?”
Ryosuke:
Do people want this? Will people use this?
Nicolás: FCP is widely used and we get a lot of complaints that it’s not enough, so no doubt there’s a need
Tim: maybe we need to gather additional data to prove this fills that gap.
Ryosuke:
Safari doesn’t even paint until we think the majority of the web page has painted.
First contentful paint should be equivalent to largest contentful paint.
Nicolás: Is that also true for splash screens?
Ryosuke: won’t that be counted?
Nicolás: That’s why we ignore removed elements
Tim: maybe we should grab some filmstrips from Safari, and eyeball what this metric would do there?
Ryosuke: What Safari aims to do is avoid paint, keep the user looking at the previous screen, until we have enough to paint something meaningful. So this metric will be implementation specific with regards to when things are painted. I can see this being useful in Chrome, but not in Safari.
Todd: At MS, lots of things are built in React, with 3-5 phases of rendering, and only hit “useable” at the 4th phase. IIUC, this API intends to cover some of those scenarios as well.
Tim: yeah. Would going over Safari filmstrips be useful?
Ryosuke: It may be useful, but we’ll consider any difference between this metric and FCP as a bug. Safari wants to wait until the main content has painted.
Gilles: There’s no conclusion if progressive rendering is better than waiting. Otherwise, regarding exceptions around user interaction and scrolling, how much data do we discard?
Nicolás: we don’t discard, we just report the earlier example.
Gilles: How can we avoid bias? You’d be different measurements based on user behavior.
Tim: Hoping that e.g. 90%ile data will be clean, but we can do more research there.
Nicolás: We can discard those cases based on timing.
Gilles: what’s the attribution story?
Nicolás: We expose rects, intrinsic sizes, urls as reported by RT, id when it’s there. Wondering what attribution is required for this to be useful.
Tim: this is different from what Safari does because we can calculate it retroactively, where the browser can’t do that while painting. So I’d expect differences.
Ryosuke: For this to be useful to webdevs, we want a metric that is useful in all browsers. Usecase of wanting to identify when important things have painted makes sense. Suppose you’re writing an editor app, where you first paint a splash screen. The splash screen is a big image. The editor content is empty. What would the metric do?
Yoav: The page will not be a single element which is completely white. Ryosuke - IIUC, this metric and FCP will be very close in Safari?
Ryosuke: I think so.
Yoav: So Safari will still show this metric very close to Chrome’s implemention of it, it’s only FCP that will be significantly delayed, right?
Ryosuke: you can see it that way.
Tim: There’s certainly developer demand for something like this. What would be alternatives we can take?
Benjamin: What about last contentful paint?
Tim: We don’t want to penalize continuous updates and lazy loading. Motivated engineers can create their own metrics using Element Timing.
Yoav: and the goal with this metric is to gather that data for the majority who won’t annotate their elements.
Tim: Ryosuke, any alternatives?
Ryosuke: the idea to measure this has come up multiple times, and we always didn’t find a good algorithm for this. This is just a bunch of heuristics.
Tdresser: Element Timing as a primitives will help us reach a different outcome. FMP was a bag of heuristics. We should solve the problem even if the solution is not perfect.
Ryosuke: heuristics may change in the future.
Nicolás: reasons for heuristics is to exclude some of the images. This is not a black box of heuristics. We want to use the size of the image as a proxy for its relevance.
Ryosuke: the correlation between size and importance is a heuristic in itself
Yoav: I’d be uncomfortable if we said that this is a problem we cannot solve. Developers continuously ask for this.
Gilles: maybe ET usage can help guide us towards the right solution. This is a lot of guesswork. It will probably not work for many other sites.
Steven: we sell a platform for our customers to create their pages, where ET will not work.
Gilles: yeah, but we should give more time for ET to be used. This is making a lot of assumptions regarding the user interaction model. This reminds above-the-fold synthetic metrics. Maybe the future is that people interact more and more early. Input limitation is a problem. So need a metric that captures that interaction as well.
Nicolás: We saw data saying that this is better than FCP
Steven: We sell a platform where customers add components, and want to measure it even if they don’t annotate.
Gilles: we still haven’t given developers a chance to experiment and find patterns that work, which can inform a high-level metric design. This is assuming a lot of the user interaction model and how users behave. This metric becomes less useful as people interact earlier. We need to also capture elements that are below the initial viewport.
Ryosuke: can imagine a webapp where users scroll early to the content they care about
Ryosuke: did you look at pages already using element timing, and seen how well this matches up with LCP?
Tim: very few websites use ET today, but we should look at that.
Yoav: from an analytics provider’s perspective, does this make sense?
Nic: we need something that doesn’t require manual annotation. Generally excited about this, but we’d need to think through the implications of user input stopping the updating of this metric. We’d need to log when interactions occurred as well. But onboard with getting something like this.
Yoav: what I hear from folks is:
Concern from folks around input and abort bias.
Heuristics: how comfortable are folks with a heuristic based approach where the intent is declared, but the heuristic calculation itself is UA defined?
Markus: ideally any heuristics would live in the page, and the page would do the annotation etc. This isn’t feasible though. We’ll need some heuristics in the browser.
Ryosuke: it would be better if the heuristics lived in the analytics provider.
Nicolás: we could relax the constraints and emit more ETs, so that analytics providers can calculate this retroactively.
<out of time>

On Fri, Apr 12, 2019 at 3:31 AM Yoav Weiss <yoav@yoav.ws> wrote:
Hey all,

Join us tomorrow for a WG call and talk about new feature designs.

On the agenda for tomorrow we currently have discussions about Element Timing and Largest Contentful Paint. Feel free to add more items to the agenda if there's something else you'd like to discuss.

The hangout for the call would be the usual one.

See y'all tomorrow,
Yoav

Received on Thursday, 25 April 2019 02:50:20 UTC