Progress on Simulation from Garret Rieger on 2020-09-11 (public-webfonts-wg@w3.org from September 2020)

From: Garret Rieger <grieger@google.com>
Date: Fri, 11 Sep 2020 12:32:42 -0700
To: "w3c-webfonts-wg (public-webfonts-wg@w3.org)" <public-webfonts-wg@w3.org>
Message-ID: <CAM=OCWYcb2iYMFLKm8fT-yZFniVO9ofrVBXOcXUKdQLG1ihJrw@mail.gmail.com>

Over the past couple of weeks since we last met I've been working towards
getting a final version of the simulation run. Here's what I've gotten done:

- We generated a new version of the data set internally. This new data
set is significantly larger (~150 million sequences vs 150,000 sequences in
the current data set) and has better language coverage outside of latin
than the previous data set. I'm planning on sending out the new data sent
out later today.
- I got set up to run simulations for range request:
- Generated optimized versions of each font in the library using
Myle's optimizer tool.
- Found and fixed issues in the font optimizer script (
https://github.com/litherum/StreamableFonts/commit/7c78f3cebfa79165a962dfc56962538a1aa1fede
and https://github.com/litherum/StreamableFonts/pull/3) which
prevented it from correctly re-ordering glyphs in the font.
- The current optimizer implementation drops hints in the outputted
fonts, so I dropped hints from all of the non-optimized fonts so that the
simulations are making a fair comparison between range request and other
methods.
- The optimizer doesn't work for variable fonts at the moment due to
an issue with flattened composite glyphs not being compatible
with gvar. So
I also dropped all variable tables from fonts in the library. VF
fonts are
only a small part of the library so I don't expect this to have
much effect
on the results.
- Lastly, the optimizer currently re-orders the notdef glyhph (glyph
0) which should not be moved. However, I don't believe this should
significantly impact the results of the simulation, but should be fixed
eventually.
- When I send out the new data set I'll include the optimized version
of the library that I generated.
- I updated the analyzer to gracefully handle errors during simulations
of individual sequences. If a method fails for a particular sequence then
results for all methods on that sequence are dropped from the output.
Additionally the indices of the failed sequences are stored and written to
a file at the end of the simulation to aid in debugging. There's currently
an issue where a small number of patch subset simulations fail since the
harfbuzz subsetter does not yet support GSUB/GPOS re-packing to fix
overflowed offsets. Code for failure handling is here:
https://github.com/googlefonts/PFE-analysis/tree/script_update. I'm
going to try and get that merged to the main repo today.
- Since range request simulations need to be run against the optimized
version of the fonts, but all of the other methods are run against the
non-optimized version of the fonts we currently end up with two separate
result files. I wrote a tool which can merge the two separate result files
back into a single file. That code can also be found in the script_update
branch here:
https://github.com/googlefonts/PFE-analysis/tree/script_update.
- I developed a new method for summarizing the results of the analysis (
https://github.com/w3c/PFE-analysis/blob/master/tools/summarize_results.py#L180)
that produces a succinct comparison between methods. I'm going to send out
a separate email with a detailed writeup on that later today.
- Finally, I'm currently re-running the simulations using the new data
set and the optimized font library. Hopefully those runs will finish over
the weekends and I'll have some results to present for our Monday meeting.

Received on Friday, 11 September 2020 19:33:12 UTC