Re: WG design call - April 12th @ 11am PST from Philip Tellis on 2019-04-24 (public-web-perf@w3.org from April 2019)

From: Philip Tellis <ptellis@soasta.com>
Date: Wed, 24 Apr 2019 11:23:20 -0400
To: tdresser@google.com
Cc: NicolÃ¡s PeÃ±a <npm@google.com>, Fred Short <fshort3@gmail.com>, Nic Jansma <nic@nicj.net>, Yoav Weiss <yoav@yoav.ws>, public-web-perf <public-web-perf@w3.org>
Message-ID: <CALU2pfgtg-gPh5msSWbien14_aA9zo9a+C28rX_x6PebGPY87Q@mail.gmail.com>
Nic can probably add more details, but yes, in our experience, very few
users of RUM add custom instrumentation.  Most site owners want the RUM
product to just "figure it out" for them, which is why we've ended up
writing our own logic to detect SPA start/end that goes beyond what
frameworks provide. We provide hooks to specify hero images, but only a few
customers use it.  It's often easier to find out what the de-facto CSS
classes in use are and just build that into our detection logic.

On Wed, Apr 24, 2019 at 10:51 AM <tdresser@google.com> wrote:

> "if they are interested in measuring the user experience of their page,
> they will take the time to do the necessary annotation. "
>
> I don't have concrete data on this, but anecdotally that doesn't appear to
> be the case.
> Most pages use RUM analytics to gather data about their pages, but don't
> do any custom instrumentation. Perhaps +Nic Jansma <nic@nicj.net> would
> have some data on this?
>
> "Nicolás Peña" wrote:
>
>> ElementTiming will support images and text. Not sure if you consider that
>> to cover forms, but I think these two are the building blocks of any
>> website.
>>
>> On Wed, Apr 24, 2019 at 9:52 AM Fred Short <fshort3@gmail.com> wrote:
>>
>>> Hi,
>>>   I initially added this to the video comment section but sending to the
>>> public-web-perf list as well:
>>>
>>> "Interesting conversation but I think Ryosuke largely has it right. I
>>> think element size as a proxy for importance to the user won't ultimately
>>> be useful or relevant since what is important will vary greatly from page
>>> to page (i.e. what is meaningful for a static web page will be completely
>>> different to what is meaningful for a business app). Trying to do something
>>> that "loosely models user experience" will result in something that is
>>> largely not meaningful to most. The annotation approach via ElementTiming
>>> is the better approach and will be more useful/accurate in the long run.
>>> The argument that dev's won't take the time to annotate isn't a good one -
>>> if they are interested in measuring the user experience of their page, they
>>> will take the time to do the necessary annotation. A default,
>>> heuristic-based approach, won't provide much useful information relative to
>>> their page so they will need to fall back to ElementTiming to get something
>>> more meaningful for their page. One comment about the ElementTiming spec -
>>> If I'm reading this correctly, this is initially limited to annotating
>>> images within the page. This will need to be expanded to include other
>>> elements (i.e. form elements for starters) before this is going to be truly
>>> useful for business applications in the wild.”
>>>
>>> Thanks,
>>> Fred
>>>
>>> On Apr 22, 2019, at 4:45 PM, Yoav Weiss <yoav@yoav.ws> wrote:
>>>
>>> (Just realized this email was never sent after the April 12th call.
>>> Apologies!!)
>>>
>>> Hey all,
>>>
>>> Minutes
>>> <https://docs.google.com/document/d/e/2PACX-1vSakfqF726GBh3eXChMsE0LqaAivvnZs-3HEmhHEhYOZer4ZbRogxDF5sqd6FWI8Vfettp0sf_vaHuL/pub>
>>> and video <https://youtu.be/9Gu1V0KVOAk> from the call are now
>>> available. Copying them here for safe keeping.
>>>
>>> Cheers,
>>> Yoav
>>>
>>> WebPerfWG design meeting - April 12th 2019
>>> Participants:
>>> Steven Bougon, Gilles Dubuc, Phillippe Le Hegaret, Nicolás Pena, Maxime
>>> Villancourt, Tim Dresser, Benjamin De Kosnik, Ryosuke Niwa, Andrew
>>> Comminos, Markus Stange, Nic Jansma, Todd Reifsteck, Yoav Weiss
>>>
>>> Face to Face:
>>>
>>>    - We need to notify everyone 8 weeks in advance.
>>>    - No date nailed down yet.
>>>    - Web Games workshop at the end of June
>>>
>>> Next call dates:
>>>
>>>    - Tuesday April 23th @ 9:00am PST
>>>
>>> Largest Contentful Paint: slides
>>> <https://www.google.com/url?q=https://docs.google.com/presentation/d/1oEqw3AZwOjHUake1-MnYrUm4Zti64LRFbBHin16xtSs/edit%23slide%3Did.p&sa=D&ust=1555317971032000>
>>>  (Nicolás)
>>>
>>>    - Improves on FCP
>>>    - Looks at the largest text or image.
>>>    - Stop looking at input or page unload.
>>>    - Heuristics:
>>>
>>>
>>>    - Ignore removed elements.
>>>    - Ignore background images attached to the body.
>>>
>>>
>>>    - Mobile:
>>>
>>>
>>>    - Most mobile pages have at least one large image.
>>>
>>>
>>>    - Desktop:
>>>
>>>
>>>    - May have more page with large text elements.
>>>
>>>
>>>    - Ryosuke: why is this important?
>>>    - Tim: We have had lots of feedback that FCP isn’t sufficient.
>>>    Element timing is strictly better than LCP, but not everyone will manually
>>>    annotate for element timing.
>>>    - Ryosuke: It seems like this is just heuristics. If developers
>>>    don’t take the time to annotate their websites for ET, why would they care
>>>    about this?
>>>    - Yoav: This isn’t just a bunch of heuristics, we’re trying to find
>>>    the largest element, under the assumption that it is meaningful for the
>>>    user. There are heusristics related to size of text elements,
>>>    meaningfulness of BG images, but eventually, we want to use size as a proxy
>>>    for importance.
>>>    - Todd: I think the question was “are people asking for this?”
>>>    - Ryosuke:
>>>
>>>
>>>    - Do people want this? Will people use this?
>>>
>>>
>>>    - Nicolás: FCP is widely used and we get a lot of complaints that
>>>    it’s not enough, so no doubt there’s a need
>>>    - Tim: maybe we need to gather additional data to prove this fills
>>>    that gap.
>>>    - Ryosuke:
>>>
>>>
>>>    - Safari doesn’t even paint until we think the majority of the web
>>>    page has painted.
>>>    - First contentful paint should be equivalent to largest contentful
>>>    paint.
>>>
>>>
>>>    - Nicolás: Is that also true for splash screens?
>>>    - Ryosuke: won’t that be counted?
>>>    - Nicolás: That’s why we ignore removed elements
>>>    - Tim: maybe we should grab some filmstrips from Safari, and eyeball
>>>    what this metric would do there?
>>>    - Ryosuke: What Safari aims to do is avoid paint, keep the user
>>>    looking at the previous screen, until we have enough to paint something
>>>    meaningful. So this metric will be implementation specific with regards to
>>>    when things are painted. I can see this being useful in Chrome, but not in
>>>    Safari.
>>>    - Todd: At MS, lots of things are built in React, with 3-5 phases of
>>>    rendering, and only hit “useable” at the 4th phase. IIUC, this API intends
>>>    to cover some of those scenarios as well.
>>>    - Tim: yeah. Would going over Safari filmstrips be useful?
>>>    - Ryosuke: It may be useful, but we’ll consider any difference
>>>    between this metric and FCP as a bug. Safari wants to wait until the main
>>>    content has painted.
>>>    - Gilles: There’s no conclusion if progressive rendering is better
>>>    than waiting. Otherwise, regarding exceptions around user interaction and
>>>    scrolling, how much data do we discard?
>>>    - Nicolás: we don’t discard, we just report the earlier example.
>>>    - Gilles: How can we avoid bias? You’d be different measurements
>>>    based on user behavior.
>>>    - Tim: Hoping that e.g. 90%ile data will be clean, but we can do
>>>    more research there.
>>>    - Nicolás: We can discard those cases based on timing.
>>>    - Gilles: what’s the attribution story?
>>>    - Nicolás: We expose rects, intrinsic sizes, urls as reported by RT,
>>>    id when it’s there. Wondering what attribution is required for this to be
>>>    useful.
>>>    - Tim: this is different from what Safari does because we can
>>>    calculate it retroactively, where the browser can’t do that while painting.
>>>    So I’d expect differences.
>>>    - Ryosuke: For this to be useful to webdevs, we want a metric that
>>>    is useful in all browsers. Usecase of wanting to identify when important
>>>    things have painted makes sense. Suppose you’re writing an editor app,
>>>    where you first paint a splash screen. The splash screen is a big image.
>>>    The editor content is empty. What would the metric do?
>>>    - Yoav: The page will not be a single element which is completely
>>>    white. Ryosuke - IIUC, this metric and FCP will be very close in Safari?
>>>    - Ryosuke: I think so.
>>>    - Yoav: So Safari will still show this metric very close to Chrome’s
>>>    implemention of it, it’s only FCP that will be significantly delayed, right?
>>>    - Ryosuke: you can see it that way.
>>>    - Tim: There’s certainly developer demand for something like this.
>>>    What would be alternatives we can take?
>>>    - Benjamin: What about last contentful paint?
>>>    - Tim: We don’t want to penalize continuous updates and lazy
>>>    loading. Motivated engineers can create their own metrics using Element
>>>    Timing.
>>>    - Yoav: and the goal with this metric is to gather that data for the
>>>    majority who won’t annotate their elements.
>>>    - Tim: Ryosuke, any alternatives?
>>>    - Ryosuke: the idea to measure this has come up multiple times, and
>>>    we always didn’t find a good algorithm for this. This is just a bunch of
>>>    heuristics.
>>>    - Tdresser: Element Timing as a primitives will help us reach a
>>>    different outcome. FMP was a bag of heuristics. We should solve the problem
>>>    even if the solution is not perfect.
>>>    - Ryosuke: heuristics may change in the future.
>>>    - Nicolás: reasons for heuristics is to exclude some of the images.
>>>    This is not a black box of heuristics. We want to use the size of the image
>>>    as a proxy for its relevance.
>>>    - Ryosuke: the correlation between size and importance is a
>>>    heuristic in itself
>>>    - Yoav: I’d be uncomfortable if we said that this is a problem we
>>>    cannot solve. Developers continuously ask for this.
>>>    - Gilles: maybe ET usage can help guide us towards the right
>>>    solution. This is a lot of guesswork. It will probably not work for many
>>>    other sites.
>>>    - Steven: we sell a platform for our customers to create their
>>>    pages, where ET will not work.
>>>    - Gilles: yeah, but we should give more time for ET to be used. This
>>>    is making a lot of assumptions regarding the user interaction model. This
>>>    reminds above-the-fold synthetic metrics. Maybe the future is that people
>>>    interact more and more early. Input limitation is a problem. So need a
>>>    metric that captures that interaction as well.
>>>    - Nicolás: We saw data saying that this is better than FCP
>>>    - Steven: We sell a platform where customers add components, and
>>>    want to measure it even if they don’t annotate.
>>>    - Gilles: we still haven’t given developers a chance to experiment
>>>    and find patterns that work, which can inform a high-level metric design.
>>>    This is assuming a lot of the user interaction model and how users behave.
>>>    This metric becomes less useful as people interact earlier. We need to also
>>>    capture elements that are below the initial viewport.
>>>    - Ryosuke: can imagine a webapp where users scroll early to the
>>>    content they care about
>>>    - Ryosuke: did you look at pages already using element timing, and
>>>    seen how well this matches up with LCP?
>>>    - Tim: very few websites use ET today, but we should look at that.
>>>    - Yoav: from an analytics provider’s perspective, does this make
>>>    sense?
>>>    - Nic: we need something that doesn’t require manual annotation.
>>>    Generally excited about this, but we’d need to think through the
>>>    implications of user input stopping the updating of this metric. We’d need
>>>    to log when interactions occurred as well. But onboard with getting
>>>    something like this.
>>>    - Yoav: what I hear from folks is:
>>>
>>>
>>>    - Concern from folks around input and abort bias.
>>>    - Heuristics: how comfortable are folks with a heuristic based
>>>    approach where the intent is declared, but the heuristic calculation itself
>>>    is UA defined?
>>>
>>>
>>>    - Markus: ideally any heuristics would live in the page, and the
>>>    page would do the annotation etc. This isn’t feasible though. We’ll need
>>>    some heuristics in the browser.
>>>    - Ryosuke: it would be better if the heuristics lived in the
>>>    analytics provider.
>>>    - Nicolás: we could relax the constraints and emit more ETs, so that
>>>    analytics providers can calculate this retroactively.
>>>
>>> <out of time>
>>>
>>>
>>> On Fri, Apr 12, 2019 at 3:31 AM Yoav Weiss <yoav@yoav.ws> wrote:
>>>
>>>> Hey all,
>>>>
>>>> Join us tomorrow for a WG call and talk about new feature designs.
>>>>
>>>> On the agenda
>>>> <https://docs.google.com/document/d/10dz_7QM5XCNsGeI63R864lF9gFqlqQD37B4q8Q46LMM/edit?pli=1#heading=h.4x6t5aexwllw>
>>>> for tomorrow we currently have discussions about Element Timing and Largest
>>>> Contentful Paint. Feel free to add more items to the agenda if there's
>>>> something else you'd like to discuss.
>>>>
>>>> The hangout for the call would be the usual one
>>>> <https://meet.google.com/nog-ttdz-myg?hs=122>.
>>>>
>>>> See y'all tomorrow,
>>>> Yoav
>>>>
>>>
>>>

--
Received on Wednesday, 24 April 2019 15:23:57 UTC