Thoughts on how to Summarize Simulation Results

Current State

For each network model, PFE method, and script category tuple the
simulation computes:

   -

   Total Cost
   -

   Total bytes transferred.
   -

   Total network waiting time.
   -

   Total number of requests.


Network models are divided into broad network categories (ie. 3G, 4G,
Desktop) and then for each broad category there are 5 sub-models (slowest,
slow, median, fast, fastest). The sub models were determined based on the
5th, 25th, 50th, 75th, and 95th percentile of observed speeds for that
category.

This results in quite a few data points which makes it difficult to
interpret the results to draw conclusions and make decisions. We need a way
to more concisely summarize the results to aid in making a decision.
What are we Deciding

To best determine how to summarize the data we need to first define what
decision we are trying to make from the simulation results. We are trying
to answer the following two questions:


   1.

   Does progressive font enrichment provide an improvement in web font
   loading performance for users when compared to currently in use font
   loading techniques.
   2.

   If yes, which of the two proposed PFE methods (patch subset, and range
   requests) has the better performance.


Defining performance - there are two measures of performance that we are
interested in:

   1.

   Networking loading cost: an estimation of the cost that a font load
   imposed on a users experience. This is the primary metric that we care
   about as it directly represents the end user experience.
   2.

   Bytes transferred: total bytes transferred (request and response) to
   load a font. This is a secondary measure, but still important. Sending an
   excessive number of bytes can cost users on metered connections.

Retained Dimensions

Progressive font enrichment is expected to behave differently across the
users network type and script of the font. There is no clear way to assign
a specific level of importance to each of the categories in these
dimensions, so results will be reported for each pair of (network type,
script).
Proposal for Aggregating Results

   -

   Given a network type, script, and a page view sequence:
   -

      Compute the cost of existing transfer methods (unicode range), patch
      subset, and range requests for each of the 5 sub-network models.
      -

      Aggregate the 5 sub-network costs into a single value by computed a
      weighted average:
      -

         Slowest - 5%
         -

         Fastest - 5%
         -

         Slow - 20%
         -

         Fast - 20%
         -

         Median - 50%
         -

      Normalize the patch subset and range request costs by dividing them
      by the cost of ‘unicode range’.
      -

   The set of normalized costs across all simulated sequences creates a
   distribution per each (network type, script) pair.
   -

   Finally summarize the distribution by computing the 5th, 50th, and 95th
   percentile.
   -

   Repeat this procedure for bytes transferred.
   -

   The intervals for patch subset and range requests can then be graphed
   for an easy to understand comparison.


Example graph:



This method of aggregation will allow us to answer both questions:

   1.

   Negative values of the normalized cost indicate the method performed
   better than existing methods and positive values indicate that the method
   performed worse.
   2.

   The magnitude of the normalized cost tells us how much better one method
   performed vs the other.


By including the 5th and 95th percentile we can see the variance in
performance for a method.

Code that generates this proposed comparison can be found here:
https://github.com/googlefonts/PFE-analysis/blob/script_update/tools/summarize_results.py#L180

Received on Friday, 11 September 2020 22:34:40 UTC