Thoughts on how to Summarize Simulation Results from Garret Rieger on 2020-09-11 (public-webfonts-wg@w3.org from September 2020)

From: Garret Rieger <grieger@google.com>
Date: Fri, 11 Sep 2020 15:34:10 -0700
To: "w3c-webfonts-wg (public-webfonts-wg@w3.org)" <public-webfonts-wg@w3.org>
Message-ID: <CAM=OCWZEa=ixx0sE2cB2XbLyoOqS8c4h++GRV46T-ASy_Aar4w@mail.gmail.com>

Current State

For each network model, PFE method, and script category tuple the
simulation computes:

Total Cost
-

Total bytes transferred.
-

Total network waiting time.
-

Total number of requests.

Network models are divided into broad network categories (ie. 3G, 4G,
Desktop) and then for each broad category there are 5 sub-models (slowest,
slow, median, fast, fastest). The sub models were determined based on the
5th, 25th, 50th, 75th, and 95th percentile of observed speeds for that
category.

This results in quite a few data points which makes it difficult to
interpret the results to draw conclusions and make decisions. We need a way
to more concisely summarize the results to aid in making a decision.
What are we Deciding

To best determine how to summarize the data we need to first define what
decision we are trying to make from the simulation results. We are trying
to answer the following two questions:

Does progressive font enrichment provide an improvement in web font
loading performance for users when compared to currently in use font
loading techniques.
2.

If yes, which of the two proposed PFE methods (patch subset, and range
requests) has the better performance.

Defining performance - there are two measures of performance that we are
interested in:

Networking loading cost: an estimation of the cost that a font load
imposed on a users experience. This is the primary metric that we care
about as it directly represents the end user experience.
2.

Bytes transferred: total bytes transferred (request and response) to
load a font. This is a secondary measure, but still important. Sending an
excessive number of bytes can cost users on metered connections.

Retained Dimensions

Progressive font enrichment is expected to behave differently across the
users network type and script of the font. There is no clear way to assign
a specific level of importance to each of the categories in these
dimensions, so results will be reported for each pair of (network type,
script).
Proposal for Aggregating Results

Given a network type, script, and a page view sequence:
-

Compute the cost of existing transfer methods (unicode range), patch
subset, and range requests for each of the 5 sub-network models.
-

Aggregate the 5 sub-network costs into a single value by computed a
weighted average:
-

Slowest - 5%
-

Fastest - 5%
-

Slow - 20%
-

Fast - 20%
-

Median - 50%
-

Normalize the patch subset and range request costs by dividing them
by the cost of ‘unicode range’.
-

The set of normalized costs across all simulated sequences creates a
distribution per each (network type, script) pair.
-

Finally summarize the distribution by computing the 5th, 50th, and 95th
percentile.
-

Repeat this procedure for bytes transferred.
-

The intervals for patch subset and range requests can then be graphed
for an easy to understand comparison.

Example graph:

This method of aggregation will allow us to answer both questions:

Negative values of the normalized cost indicate the method performed
better than existing methods and positive values indicate that the method
performed worse.
2.

The magnitude of the normalized cost tells us how much better one method
performed vs the other.

By including the 5th and 95th percentile we can see the variance in
performance for a method.

Code that generates this proposed comparison can be found here:
https://github.com/googlefonts/PFE-analysis/blob/script_update/tools/summarize_results.py#L180

Received on Friday, 11 September 2020 22:34:40 UTC