What the font optimizer does

Today we resolved that I should write up a bit about what the font optimizer does.

First: I want to draw a distinction between what is required vs what’s optional. The only requirement for the range-request method is that 1) outlines are at the end of the file, and 2) outlines are independent among different glyphs.

(2 isn’t even a hard requirement, but it’s a working assumption I’m under for now. It’s easy to imagine a system that wouldn’t require independent glyph outlines.)

Any piece of font production software that can satisfy these two requirements would produce a font that is compatible with the range request method.

However, it’s possible to do better: we can sort the font file, to keep glyphs that are commonly used close together within the file. There are many approaches to doing this. I’ve tried about a half-dozen methods, and the one that I’ve observed to to do the best is a simulated annealing algorithm. This is the approach the optimizer I wrote uses. But, to be clear: this approach isn’t necessary or required; if someone gets good/better results using another algorithm, they can absolutely use that other algorithm.

Simulated annealing is a sequential algorithm. The algorithm produces a series of state transitions, and the last state in the series is (supposed to be) the best state. For our purposes, each state is a particular ordering of glyphs in the file. A transition between two states could have a few different forms:

1. Swapping two glyphs
2. Rotating a subsequence of glyphs
3. Reversing a subsequence of glyphs

Simulated annealing also requires a fitness function, to judge each state on. The fitness function I use is “the number of bytes in the font file we can avoid downloading.” Note that this doesn’t include the sigmoid fitness function from the evaluation report - I wrote this code before that function existed. Picking this fitness function might be related to the fact that evaluation report shows the range request method doing better on byte savings than on the overall sigmoid fitness score.

So the idea is you start with some initial state, then you make a random state transition, and then you decide to “move to” that state or not based on a probability, and then repeat from whichever state you’re at now. If the new state is significantly better than the old state, the probability of moving to that state is higher, and vice-versa. The probability function (the entire curve) is supposed to lower over time, so in the early states you have more of a chance of jumping to a worse state, and at the end you have less of a chance of jumping to a worse state. This is because, in the beginning you want to jump around the search space more to try to find lots of different options (sort of like when humans do “brainstorming” or “spitballing”) but over time you want to hone in on the best option you’ve found so far, and refine it by making fewer and smaller changes (like when humans do “convergence”).

The optimizer I wrote runs a few of these sequential sequences at the same time (because it runs on the GPU which is a big parallel processor, so we can run a few of these in parallel because why not). The best performing initial ordering (“seed”) seems to be a simple sort by frequency count of each glyph, but there a few others that do pretty well (like ordering by highest bigram score, etc. - I can get into more detail about this if you’re interested).

I also measured that swapping two glyphs seems to yield the best results (rather than rotating or reversing), so that’s the state transition operator that’s used.

I also played around with various probability functions and cooling functions, but discovered the best performing one seemed to be the super simple “if a state is better, move there, otherwise don’t.” So that’s the function the optimizer uses. In addition, I couldn’t find any benefit to cooling over time, so the function doesn’t change over time.

(These findings in the previous paragraph were surprising to me, but after doing some more investigation, there is some rough intuition for why they’re true. Given the fitness function I was using, if you pick random orderings, pretty much all of them have almost identical fitnesses. This means that the search space is sort of a vast landscape of mediocrity: there don’t appear to be many lobes or hills or valleys or mountains. So, all you can do are sort of micro-optimizations. There aren’t many “chains" where making one optimization opens up a whole new region for more optimization, which open yet new regions for optimization. The entire space is pretty flat.)

((Note that the above might change if I switch fitness function and/or forms of state transitions. Again, the optimizer I came up with isn’t required, or even necessarily a good optimizer - it’s just one proof-of-concept that tries to do pretty well.))

I hope that all made sense. If not, please don’t hesitate to ask questions,
Myles

Received on Monday, 12 October 2020 19:55:03 UTC