Re: WebPerfWG call - March 17th 2022 @ 10am PT - A/B testing from Yoav Weiss on 2022-03-18 (public-web-perf@w3.org from March 2022)

From: Yoav Weiss <yoavweiss@google.com>
Date: Fri, 18 Mar 2022 17:44:30 +0100
To: Carine Bournez <carine@w3.org>
Cc: public-web-perf <public-web-perf@w3.org>
Message-ID: <CAL5BFfXr5yeedrCVKoRFTt_mR6g0AzUjxVQov+f_9avdiy6umw@mail.gmail.com>
Minutes
<https://w3c.github.io/web-performance/meetings/2022/2022-03-17/index.html>
and presentation recording <https://youtu.be/oBl5JLjadYM> from yesterday's
meeting are now published. Copying them here for convenience:
WebPerfWG - March 17 2022
Participants

Alex N. Jose, Pat Meenan, Andrew Galloni, Nic Jansma, Alex Christensen,
Sean Feng, Ian Clelland, Marcel Duran, John Engebretson, Giacomo Zechini,
Nitish Mittai, Ankit Jain, Alon Kochba, Hardien Raffali, Michal Mocny,
Philip Walton, Noam Helfman, Yoav Weiss, Neil Craig, Tim Kadlec, Carine
Bournez, Philip Tellis, Xiaochen Hu
MinutesA/B testing
<https://www.google.com/url?q=https://docs.google.com/presentation/d/1-cxHITwVtWJ5x3ev0__XzDtDtJn2cB9CAgN9Mkia3Ag/edit%23slide%3Did.g11de5b0bf6b_0_304&sa=D&source=editors&ust=1647622731262046&usg=AOvVaw1ENOhe8pJA2SSdzN5lnkF0>
-
Alex N. Jose

Presentation recording
<https://www.google.com/url?q=https://youtu.be/oBl5JLjadYM&sa=D&source=editors&ust=1647622731262615&usg=AOvVaw2tN8DvR5HAuXb1OoRR2HEX>

   - *Alex*: Discussions last year about this
   - A/B testing is about applying changes to a web application on the
   browser - typically cosmetic changes
   - Using JS to perform those changes which often results in a performance
   cost - scripts is fetched, parsed and executed and if done in a blocking
   way that incurs a penalty in FP/FCP
   - If we’re not blocking, the result is a flash on pre-modification
   document
   - It’s a flexible solution that enables non-technical folks to run
   experiments and is very popular
   - Even for teams that do server-side A/B testing, client-side enables
   them to offload some cosmetic efforts
   - Discussed last year. Basically we want the outcomes of A/B testing
   without the performance penalties
   - Had ideas around standardizing a transformation language, using edge
   as the insertion point or to block just rendering in the browser without
   parser blocking
   - Took some of those ideas and explored how that would function,
   resulting in a spec sketch and a prototype. Hoping to get feedback on that.
   - Conceptually, A/B testing is composed of a control document and a set
   of transformations, expected to be idempotent, so if you apply them
   multiple times, there are no side effects. E.g. if you have a list of items
   and the user goes back to a page, the items won’t be duplicated.
   - We don’t define in the spec how these transformations are idempotent,
   only that they should be
   - We broadly classify the transformation to be of 2 types: pre_ua and
   on_ua
   - The pre_ua are equivalent to a server-side transforms
   - On_ua ones are ones that require client-side JS - e.g. for SPAs
   - Want to apply both without performance degradations
   - What would a transform look like?
   -
   -
   - Examples include color changes, title changes, different text,
   coloring the first item in a list, etc
   - Sequence diagram of a prototype through a CloudFlare worker
   -
   - For the prototype I’m using a git based mechanism, not a real A/B
   provider
   - Code + on_ua transforms are injected to the head of the document
   -
   - The prototype is using a mutation observer that listens to all the DOM
   changes, and that’s the only blocking work
   - The rest of the work happens async, as the browser parses the document
   and when JS is making changes to the DOM. Each matching element gets
   processed through the transform functions.
   - Transform functions are current generic JS, want to open that for
   debate
   -
   - Demo is using Cloudflare workers with Low Latency HTML parser and
   using a GH gist as the JSON source.
   - Used React MVC for the demo.
   - Title is created in JS along with the rest of the app. Want to change
   it as it’s created.
   - Control json
   -
   - Includes the 2 variants
   - The transformation itself is JS
   - Could create issues because the language for on_ua and pre_ua is not
   the same, and edge may not run arbitrary JS
   - We have an ongoing test, so that as the list gets added more items,
   the styles get re-applied to what’s now the first - done for demo purposes
   - *demos that page is switching between variant A and B*
   - No flash of pre-variant content, and variations are applied on an
   ongoing basis as items are added to the list
   - From a perf characteristics, there’s no visible cost to running the
   experiment
   - In repeated tests, it doesn’t show significant perf costs
   - Key lessons - The edge/CDN based approach enables parallel A/B
   configuration fetch
   - Having the transformations before document creation is key to
   performance - head injection
   - CDNs may not be excited about arbitrary JS
   - Where do we go from here?
   - Could standardize the transforms and move away from JS, even though
   that’s different from what A/B testers are doing today.


   - Could use mutation records
   - They would be more verbose - could avoid some problems by moving the
   implementation of the “applicator” into the browser


   - Could move the “edge” part to the origin - doesn’t have to be a CDN,
   which may an architectural change
   - If we move this to the browser, there would be a cost, because the
   browser can only fetch the config after seeing the document, rather than in
   parallel
   - Spec in progress - “blocking=render” implementation, that could enable
   us to do the same thing without the edge, would block rendering without
   blocking parsing


   - Could be a significant benefit for current A/B testing


   - Interested in opinions on how we can take it from here
   - *Michal*: Small question. In the TODO MVC example, removing item 1
   made item 2 become red only after a while, because MutationObservers take
   time. Is that inherent to the prototype or is this how client side A/B
   testing works?
   *Alex*: I haven’t seen the flash. The mutations are deferred to the next
   animation frame, so maybe there’s a one frame lag. Should look at that in
   detail. In my testing I haven’t seen such flashes
   - *Michal*: There’s how long it takes for mutation observers to be
   applied
   - *Alex*: Hoping to find out how complex transformations can be
   - *Pat*: When prototyping were there browser capabilities you found were
   missing? Are mutationObservers good enough? Needed to polyfill something?
   - *Alex*: Capability to install a SW as part of the first request could
   have avoided the need for the Edge here. But there are some challenges with
   that
   - … For mutation observers specifically, that part is working great, but
   need to test that with complex pages
   - *Pat*: Didn’t run into any transforms that you couldn’t do?
   - *Alex*: No because I allowed transformations to be arbitrary JS. If we
   move away from JS, then the capabilities may be limited, and then we could
   run into problems when doing complex things.
   - *Nitish*: Represent A/B testing provider. Can tell that using
   MutationObservers on a complex page, we’ve seen deadlocks created - the
   application trying to create components and the scripts tries to create
   changes, where the 2 parties try both to apply changes. Seen this as a
   practical limitation. Would definitely need a way to avoid conflicts when 2
   parties try to apply changes to the same element.
   - *Alex*: Is that because we’re applying changes sync? One of the things
   I’ve done is to defer all the required changes to the next paint cycle. But
   I can see the problem you mention if the application depends on DOM state
   - *Nitish*: Seen changes get delayed, even if they are applied in the
   next cycle. There’s a chance of users seeing portions of the control page.
   - *Alex*: We could use a different construct in the browser that helps
   to give priority to these mutations.
   - *Nitish*: When applying mutationObserver on the body, the number of
   mutations is huge and they happen very frequently. If every mutation would
   apply a DOM change, that could be very expensive in terms of computation
   - *Alex*: The performance will depend on the transform code, and
   selector performance. Prototype defers transforms until the next paint
   cycle, to reduce computation and acting on each MutationRecord.
   - *Pat*: That’s what I was getting to - need a mutation observer with
   the selectors built in. That lets React to let you hook into the VDOM
   rather than the real DOM
   - *Alex*: Standardization could help on that front. Having the
   transformations applied by the browser
   - *Nitish*: If the operations are applied again and again, their results
   should be the same. If I add items to a list, the end output needs to
   remain the same regardless of how many times the function was applied
   - *Alex*: Yeah, that’s one of the requirements. The spec is expecting
   that you check the conditions to make these transforms idempotent. One can
   shoot themselves in the foot if, say, a navigation item was added multiple
   times as the user is navigating back and forth between pages in an SPA.
   Transform has to ensure that they are idempotent. in
   - *Nitish*: If you’re applying the changes to a DOM attribute. It might
   re-render itself entirely or partially.
   - *Alex*: Should have an example of that. Could check if we’re really
   applying the transformation. You can have conditional checks as part of the
   transformations. The expectation is that the A/B transform is created in a
   way that’s idempotent.
   - That’s harder in the version where we have dedicated transformation
   language
   - *Hadrien*: PM for Google Optimize. Excited about this. Probably our
   biggest problem
   - … would it be helpful if we sent you examples of pages that customers
   complain about performance? What would help stress-test this? Especially
   over mobile
   - *Alex*: That’d be awesome. I chose to test this over 3G/Mumbai. But
   more complex pages would be great
   - *Michal*: Is there a tool that creates these configs?
   - *Hadrien*: Google Optimize product has a wysiwyg editor. Sophisticated
   customers would write code inside the editor, but most don’t
   - *Michal*: Can we create an experiment where this transforms are the
   output
   - *Hadrien*: probably
   - *Alex*: Optimize currently transforms everything to JS, right?
   - … How much JS do we need? Would a finite operation set be sufficient?
   - *Hadrien*: Not sure what the limitations would be for a more
   restricted set. Can you walk through both cases?
   - *Alex*: on_ua you have a fully capable JS, but if you wanted to define
   it as a set of DOM operations, these operations limit your capability. E.g.
   click adds something to cart, which modified the JS state, not a DOM
   operation. It’s theoretically possible, but not sure if that’s something
   someone is doing
   - *Hadrien*: We would go for more capabilities. I imagine it comes at a
   cost. Customers do a wide range of things. Instinctively, arbitrary JS.
   - *Pat*: Wondering if we’re trying to standardize the entire package, or
   are there atomic pieces that we should standardize and not try to specify
   the client-side transformations themselves? i.e. A spec for edge HTML
   rewrites that all CDN’s (and servers) could implement that would allow for
   fetching of the experiment definition, group selection and HTML transforms
   (one of which would be embedding any client-side JS and transform
   definitions that need to be written but that should be provider-specific).
   Are there browser-side API’s that need to be improved to make the
   client-side transforms more efficient?
   - … How do we standardize the HTML rewriting on the Edge feels important.
   - *Hadrien*: Our customers are moving to server-side experiments which
   are much more costly
   - *Neil*: Work at the BBC. Going through a process of site-wide A/B
   testing. What Pat just said - we’re really struggling because we have to do
   everything on the server side, which explodes the cache.
   - … Mandating edge service - would be great for us to be able to do
   everything client side without any edge servers
   - … otherwise, can the A/B policy be cached client side?
   - … Struggle around making experiments sticky, where users switch
   groups. Ensuring that the A/B group is sticky would be great.
   - *Nitish*: Thoughts on standardization. Full JS power gives you a ton
   of flexibility - hybrid approach would be great, enabling custom JS for
   some things, but standard operations for most things.
   - *Andrew*: Thoughts - why would you need differences between pre_ua and
   on_ua? Any edge hop in between can perform transformations based on
   capabilities
   - *Alex*: A/B provider is best to make that judgement, some have to be
   applied on the client e.g. for an SPA. Some DOM elements are only created
   on the client.
   - Another example is swapping a stylesheet - needs to be done on the
   server before the client fetched the control stylesheet
   - An instance where we need both ON_UA and PRE_UA on the same transform
   is a server side rendered SPA, where markup will be present during PRE_UA,
   that needs to be transformed, and later ON_UA, where it would be
   potentially re-rendered. Thus transforms at both places are needed.
   - … But I like the direction we’re suggesting to take a hybrid approach.
   Have a defined spec that can be applied in a performant way by the Edge/UA,
   and also having the spec allowing JS
   - *Andrew*: Any API you want on the Edge that’d make this easier?
   - *Alex*: Asking an Edge impl from someone that doesn’t have one
   requiring arch change. Browser changes could enable avoiding edge mandats -
   moving the edge to the client
   - *Nic*: best forum?
   - *Yoav*: WICG repo, probably

Chat Log

You1:03 PM

https://forms.gle/QDUYYAQBrhh1q13A8
<https://www.google.com/url?q=https://forms.gle/QDUYYAQBrhh1q13A8&sa=D&source=editors&ust=1647622731274263&usg=AOvVaw1suiOQ_yxEd8tY1gyKTTki>

You1:05 PM

https://docs.google.com/presentation/d/1-cxHITwVtWJ5x3ev0__XzDtDtJn2cB9CAgN9Mkia3Ag/edit#slide=id.g11de5b0bf6b_0_304
<https://www.google.com/url?q=https://docs.google.com/presentation/d/1-cxHITwVtWJ5x3ev0__XzDtDtJn2cB9CAgN9Mkia3Ag/edit%23slide%3Did.g11de5b0bf6b_0_304&sa=D&source=editors&ust=1647622731275027&usg=AOvVaw3qw1M8H0-XP0MxDrwPXBDW>

Hadrien Raffalli1:30 PM

sorry mic problems

will rejoin

Hadrien Raffalli1:37 PM

https://support.google.com/ads-help/answer/7367525?hl=en
<https://www.google.com/url?q=https://support.google.com/ads-help/answer/7367525?hl%3Den&sa=D&source=editors&ust=1647622731275965&usg=AOvVaw3fs6dN8ayT2ULCoNuKDlsv>

Just to give you an idea of how much A/B testers allow in term of code
change size, max container limit on Google Optimize is 400kb

Hadrien Raffalli1:39 PM

Changes are not just changing css + text, customers might inject javascript
for new kinds of interactivity

In rare cases, we grant customers container size increases

Ankit Jain1:52 PM

Hey, I work with VWO (an A/B Testing provider). For creating idempotent
operations, and supporting visual changes, full JS would be required.

Ankit Jain1:58 PM

For Pre-UA, limiting it to specific operations is a good choice. For
Post-UA, if we limit it to specific methods, I for-see a lot of custom JS
being appended to the head to counter it.

Nitish Mittal2:00 PM

I also work with VWO, and can help in providing practical usecases which
can be helpful

Hadrien Raffalli2:02 PM

Thanks for inviting me! This was/ is very exciting 👋

Neil Craig2:02 PM

Avoiding the necessity for CDN would be great,w e use a mixture of
commercial CDN and in-house - for cost reasons

Ankit Jain2:02 PM

+1

Tim Kadlec2:02 PM

Gotta drop, but super interesting stuff here. Thanks for presenting, Alex.


On Thu, Mar 17, 2022 at 12:57 PM Carine Bournez <carine@w3.org> wrote:

>
> Hi all,
>
> Since US clocks moved to summer time last sunday, this is 1hr earlier
> for most non-US people, e.g. 6pm CET
>
>
> On Fri, Mar 11, 2022 at 10:22:19AM -0500, Nic Jansma wrote:
> > Hi everyone!
> >
> > On the agenda <
> https://docs.google.com/document/d/10dz_7QM5XCNsGeI63R864lF9gFqlqQD37B4q8Q46LMM/edit?pli=1#heading=h.osvewfb7hvdz
> >
> > for our next call (March 17th @ 10am PT / 1pm ET) we will discuss:
> >
> >  * *A/B testing - via Alex Jose*
> >      o *This is a followup to an open meeting we held last year Feb
> >        4th, 2021 (meeting minutes
> >        <
> https://docs.google.com/document/d/1rmVjH7-5hGk_VB0EwErM1tcEVz100XZDYlaSd75WbRE/edit
> >)*
> >
> > Plus any other issues you want to talk about. If you have additional
> items,
> > please add them to the agenda <
> https://docs.google.com/document/d/10dz_7QM5XCNsGeI63R864lF9gFqlqQD37B4q8Q46LMM/edit?pli=1#heading=h.osvewfb7hvdz
> >.
> >
> > Join us <https://meet.google.com/agz-fbji-spp>!
> >
> > The presentations will be recorded and published online afterwards.
> >
> > See you soon!
> >
> > - Nic
> > https://nicj.net/
> > @NicJ
> >
>
> --
> Carine Bournez /// W3C Europe
>
>
Received on Friday, 18 March 2022 16:46:11 UTC